# Re: Election Data

• From: "CLAY S" <clay brokenladder com>
• Subject: Re: Election Data
• Date: Sun, 27 Jul 2008 23:56:31 -0700

On Sun, Jul 27, 2008 at 23:13, Nigel Jones <dev nigelj com> wrote:
a) Assuming you are going to do a similar publication to that of the Haiku thing, then a very simple answer, if your trying to prove a point, then your not doing a very good job 'and a lot of "abstentions." Problem is, we can't tell abstentions from 0's. (There are no 0's, so that's my assessment.)' this exactly how any data you'd get from Fedora Project would be (0 = abstain OR no immediate preference)

I'm surprised your interface doesn't have a clear distinction between a zero and an abstention.  But I'm not sure what point you're trying to make.  Not being able to tell 0's from abstentions is an "inadequacy" in the data.  But there is still lots of valuable information in that data set.  Even if we do not know for certain that a ballot with _no_ intermediate scores was intentionally strategically exaggerated, it is still plausible that it is, and still quite relevant to election researchers.

, where _I'VE_ voted zero in the ballots, I would have voted zero even if I did have a no preference option.

Then the data set would have registered your zero, which is precisely what we'd want.  Again I'm not sure what your point is.

So my concern here is:
You can't compare apples with oranges and any attempt to do so would offend me as a user of Fedora

Can you put that into a formal mathematical statement?  Telling us that we can't compare "apples and oranges" and that it offends you is most uninformative.

it also makes me disregard the rest of the 'study' as phony because it gives no statistical burden of proof

It's not phone.  It's real data.  And please tell me, in as formal language as possible, what a "statistical burden of proof" is.  Could you be talking about a confidence interval perhaps?

b) It's a bit steep to just come along and ask for _complete_ voting records (anonymous or not), why haven't you asked for a simple random sample (SRS) I would have thought of any such request much more favourably (in light of point a) than not.

I specifically raised that option in one of the earliest emails in this thread.  I put it something like, "a random subset of the ballots".

c) There is the privacy of the votes, I make my vote in confidence (unless I'm told beforehand) that my choices are between me and no one else, and will ONLY be seen for purposes of tallying the votes.  Why should we treat our users votes any differently than a responsible government (i.e. not Zimbabwe).  There is _no_ reason why, and any attempt to change that I'd find disgraceful.

With all due respect, this seems irrational to me if the data is anonymous, especially if a random sample is provided.  If even a voter _himself_ cannot tell that a particular ballot in the data set is his, clearly his privacy has not been violated.

Although as I already said, I'd be more than happy to wait until future elections have a notice that the anonymized ballot data will be used for transparency/research.

Oh and please none of the "Fedora is open" yada-yada because I know that, I've known that for ages, but there is no way anyone can say everything is 100% transparent because the Board, been effectively a governing body of Fedora holds meetings is private, they release the aggregate information out (what they can) but they can't release everything...  Why?  Because there are just some things that _shouldn't_ be aired in the public arena, what can be is kept for town-hall meetings.

Agreed.  Can you think of any sound reasons the ballot data shouldn't be aired in public?

My thoughts for the Board's consideration:
- Why should a vote in a Fedora election be any different to that of nearly any governing body (heck, Public Companies don't release non-aggregate vote data that I've seen)?

That's a logical fallacy - an appeal to tradition/popularity.  If the evidence says that disclosing the anonymized data is harmless and even has some benefits, then it is _those companies_ who are doing it wrong, and so citing them as an example doesn't bolster your case.

- Please consider future implications of such a move (to release data)

Are there some negative implications you can think of?

- Can we _please_ create some sort of policy to protect the voting data (at the very least as a whole) in retrospect and for the future?

Why not just create a policy of telling the voters beforehand that the data will be made public but anonymous?

On another note, it should be made clear, that even I have _never_ seen individual vote information (anonymously or associated), there has been no reason to (yes I do check for invalid votes, but none have been displayed and the queries have always been designed to avoid showing valid data).

Again I fail to see what point you're making.

Yes, and people vote with the knowledge that such releases happen, I honestly don't see anything wrong from there perspective.  As for immediately discounting as 'worse', I don't see how you can claim that, it's worked fairly well for them and I'm all for what works.

You have no way of knowing that "it's worked fairly well for them", since you can't read the voters' minds to calculate the Bayesian regret they experienced through it.  That can only be gotten via computer simulation, where you _can_ read the voters' minds.

http://rangevoting.org/BayRegDum.html

Some sample B.R. figures for Condorcet and scoring:

 Magically elect optimum winner 100.00% Range (honest voters) 96.71% Condorcet-LR (honest voters) 85.19% Range & Approval (strategic exaggerating voters) 78.99% Condorcet-LR (strategic exaggerating voters) 42.56% Elect random winner 0.00%

-clay