Thứ Năm, 17 tháng 4, 2014

Tài liệu Implementing Risk-Limiting Post-Election Audits in California ppt


LINK DOWNLOAD MIỄN PHÍ TÀI LIỆU "Tài liệu Implementing Risk-Limiting Post-Election Audits in California ppt": http://123doc.vn/document/1055994-tai-lieu-implementing-risk-limiting-post-election-audits-in-california-ppt.htm


minimum probability of a full hand count whenever the electoral outcome is wrong, the audit is not risk-
limiting. The initial sample size is not important for controlling the risk
6
as long as there is a proper
calculation of the strength of the evidence that the outcome is correct, and the audit is expanded if the
evidence is not strong—eventually to a full manual count.
Heuristically, the evidence that the outcome is correct is weak if the sample size is small, if the
margin is small, or if the initial audit finds too many errors. The difficulty is in making these heuristics
precise—the problem addressed by the various papers on risk-limiting audits [18, 20, 17, 21, 11, 19].
As illustrated in section 3, efficient risk-limiting methods have unavoidable complexity that might make
them unsuitable for broad use, although we are hopeful that better “data plumbing” will help.
2.2.1 Existing State Legislation
The most common prescription for PEMT audits involves selecting a pre-determined percentage of
batches of ballots (e.g., precincts, machines, districts), counting the votes in those batches, and stop-
ping.
7
A notable exception is North Carolina, where the manual audit statute requires the audit sample
size to be “chosen to produce a statistically significant result and shall be chosen after consultation
with a statistician.”
8
Unfortunately, this is a misuse of the term of art “statistically significant.” The
wording does not make sense to a statistician.
New Jersey’s PEMT audit law
9
tries to enunciate risk-limiting audit principles; indeed, a co-author of
this legislation claims it is “risk-based.”
10
The statute creates an “audit team” to oversee manual audits
of voter-verified paper records and requires that the procedures the team adopts:
. . . ensure with at least 99% statistical power that for each federal, gubernatorial or other
Statewide election held in the State, a 100% manual recount of the voter-verifiable paper
records would not alter the electoral outcome reported by the audit. . .
11
This misuses the statistical term of art “power”: The language does not make sense to a statistician.
Since New Jersey’s current voting equipment does not produce an audit trail, the New Jersey audit law
6
The initial sample size can affect the efficiency, though.
7
The authors are aware of the following state-level post-election audit provisions that use tiered- or fixed-percentage au-
dit designs: Alaska specifies one precinct per election district that must consist of at least 5% of ballots cast (Alaska Stat.
§ 15.15.430 (2009)); Arizona specifies the greater of two percent of precincts or two precincts (A.R.S. § 16-602 (2008)); Califor-
nia specifies 1% of precincts (Cal Elec Code § 15360 (2008)); Colorado specifies no less than 5% of voting devices (C.R.S. 1-7-514
(2008)); Connecticut specifies no less than 10% of voting districts (Conn. Gen. Stat. § 9-320f (2008)); Florida specifies no less
than 1% but no more than 2% for one randomly-selected contest (Fla. Stat. § 101.591 (2009)); Hawaii specifies no less than 10%
of precincts (HRS § 16-42 (2008)); Illinois specifies 5% of precincts (10 ILCS 5/24A-15 (2009)) (allows machine retabulation);
Kentucky specifies between 3–5% of the number of total ballots cast (KRS § 117.383 (2009)); Minnesota specifies 2 precincts,
3 precincts, 4 precincts or at least 3% of precincts per jurisdiction, depending on the number of registered voters (Minn. Stat.
§ 206.89 et seq. (2008)); Missouri specifies in its state administrative rules the greater of 5% of precincts or one precinct
(15 CSR 30-10.110(2)); Montana specifies at least 5% of precincts and at least one federal office, statewide office, statewide
legislative office, and one statewide referendum (2009 Mt. SB 319); Nevada specifies in administrative rules between 2–3% de-
pending on the jurisdiction’s population (Nevada Administrative Code, Ch. 293.255) (allows machine retabulation); New Mexico
specifies 2% of voting systems (N.M. Stat. Ann. § 1-14-13.1 (2008)) (see further discussion in: 2.2.1); New York specifies 3% of
voting machines (NY CLS Elec § 9-211 (2009)); Oregon specifies a tiered audit structure of 3%, 5% or 10% of precincts depending
on the margin of the contest (ORS § 254.529 (2007)); Pennsylvania specifies the lesser of 2000 or 2% of votes (25 P.S. § 3031.17
(2008)) (allows machine retabulation); Tennessee specifies at least 3% of votes and at least 3% of precincts (Tenn. Code Ann.
§ 2-20-103 (2009)); Texas specifies the greater of 3 precincts or 1% of precincts (Tex. Elec. Code § 127.201 (2009)); Utah spec-
ifies at least 1% of machines (see: § 6 of [5]); Washington specifies up to 4% machines (Rev. Code Wash. (ARCW) § 29A.60.185
(2009)) (only 1% is required to be counted by hand); West Virginia specifies 5% of precincts (W. Va. Code § 3-4A-28 (2008));
Wisconsin specifies 5 “reporting units” for each voting system (see: [23] implementing Wis. Stat. § 7.08(6) (2008)) (audit occurs
only after each General Election). The following states’ audit laws do not require auditing of all contests on the ballot: Arizona,
Connecticut, Florida, Minnesota, Missouri, Montana, Tennessee, Washington and Wisconsin. The District of Columbia recently
issued an emergency rule requiring manual audits of 5% of precincts (see: [16] at 4). Vermont has no legal requirement for
manual audits but the Secretary of State may order them under certain conditions (17 V.S.A. § 2493 (2009)). Ohio Secretary of
State ordered a 5% manual audit for the November 2008 General Election using her power of Directive (See: [4]). The Verified
Voting Foundation (VVF) maintains a useful and regularly-updated dossier of these provisions [16].
8
N.C. Gen. Stat. § 163-182.1–182.2 (2009).
9
N.J. Stat. § 19:61-9 (2009).
10
Stanislevic calls the N.J. law the first “risk-based statistical audit law.” See: Howard Stanislevic, “Election Integrity: Fact &
Friction”, at: http://e-voter.blogspot.com/.
11
N.J. Stat. § 19:61-9(c)(1) (2009).
5
cannot help ensure accuracy.
12
The New Jersey statute goes on to say that auditors may adopt “scientifically reasonable assump-
tions,” including:
. . . the possibility that within any election district up to 20% of the total votes cast may
have been counted for a candidate or ballot position other than the one intended by the
voters . .
13
This assumption is sometimes called a within-precinct-miscount or within-precinct-maximum (WPM)
bound.
14
The New Jersey rule corresponds to a WPM of 20%.
The chance that a random sample will find one or more batches with error depends on the number
of batches that have error: the more batches with errors, the greater the chance. The number of batches
that must have errors for the apparent electoral outcome to be wrong depends on the amount of error
each batch can hold (and on the margin). If batches can hold large errors, few batches need to have
errors for the outcome to be wrong.
WPM limits the amount of error that each batch can hold—by assumption. WPM implies that if there
is enough error to change the outcome, the error cannot be “concentrated” in very few batches: There
is a minimum number of batches that must have error if the apparent outcome is wrong. In turn, that
implies that if the outcome is wrong, a sample of a given size has a calculable minimum chance of
finding at least one batch with an error. If the WPM assumption fails, however, outcome-changing error
can hide in fewer batches. Then the chance that a sample of a given size finds a batch with errors is
smaller than the WPM calculation suggests: The chance of noticing that there is something wrong is
smaller than claimed.
We find WPM assumptions neither reasonable nor defensible. There is no empirical or theoretical
support for the assumption that no more than 20% of ballots in a batch can be counted incorrectly, nor
that an error of more than 20% would always be caught without an audit. In fact, there is evidence to
the contrary, including the recent experience in Humboldt County, mentioned above, where 100% of the
ballots in a batch were omitted.
15
The WPM assumption generally understates the amount of error that
an auditable unit can contain.
16
Because WPM is not rigorous and tends to be optimistic, audits that
rely on WPM tend to understate the true risk, creating a false sense of security.
Three other recently proposed laws are similar to the New Jersey legislation. New Mexico State
Senate Bill 72, recently signed into law, has language that sounds risk-limiting: It requires the sample
to ensure with “at least ninety percent probability [. . . ] that faulty tabulators would be detected if they
would change the outcome of the election for a selected office.” Faulty tabulators are not the only reason
apparent outcomes can be wrong. And the word “detected” is a problem.
17
There is a big difference
between detecting error and determining that the aggregate error might be large enough to change the
apparent electoral outcome; detecting error and requiring a full hand count are not the same. An audit
does not limit risk unless it leads to full hand count whenever there is less than compelling evidence
that the apparent outcome is correct—regardless of the reason the evidence is not strong. Most laws
have no provision for expanding the audit even if the audit uncovers large errors.
Massachusetts Senate Bill 356, and its companion House Bill 652, have what appears to be good
12
As in New Jersey, manual audits are required by law in Kentucky and Pennsylvania but neither state requires auditable
voting systems. Depending on the type of voting technology, there may or may not be anything to count by hand.
13
Id.
14
The term “WPM” suggests that the audit unit is a precinct, but often the term is used more broadly to denote an upper
bound on the number of errors in an auditable batch as a percentage of the reported number of ballots or votes in the batch.
“WBM” (within-batch-miscount) might be a better term.
15
The Humboldt case was not detected by a PEMT audit. However, it proves that error can affect every ballot in a batch and
yet go undetected during the canvass.
16
A 20% bound on error can be optimistic or conservative, depending on whether there has been an accounting of ballots and
depending on the distribution of reported votes—even within a single jurisdiction. Typically, however, it is optimistic.
17
It is not the only problem with the New Mexico law: The law “hardwires” sample sizes in a look-up table that appears to
depend on a WPM-like error bound based on a snapshot of New Mexico precinct sizes. The final text of SB 72 is available
here: http://www.nmlegis.gov/Sessions/09%20Regular/final/SB0072.pdf. This bill was signed into law by New Mexico
Governor Richardson on 7 April 2009. See: http://www.governor.state.nm.us/press/2009/april/041009_07.pdf. The
law has not, at the time of writing, been codified into New Mexico’s Election statutes (N.M. Stat. Ann. § 1-13 et seq.).
6
risk-limiting language.
18
The Senate Bill states: “. . . the audit shall be designed and implemented to
provide approximately a 99% chance that a hand recount of 100% of the ballots will occur whenever
such a recount would reverse the preliminary outcome reported by the voting system.”
19
The term
“approximately” is not defined; it is unclear how much deviation from the target probability is tolerable.
The bill has other problems, too: It does not audit all races and it relies on a 25% WPM assumption. The
House bill is much better: It does not use the “approximately” language, nor does it involve any WPM
assumption.
Maryland House of Delegates Bill HB 665 appears similar to the New Mexico bill.
20
It lacks language
comparable to the risk language in the New Jersey and New Mexico laws.
21
2.2.2 Emerging State Legislation
Some state legislation and regulation come closer to mandating features of risk-limiting audits. Alaska,
California, Hawaii, Minnesota, New York, Oregon, Tennessee, and West Virginia hand count additional
precincts or machines, in some cases potentially to a full count, depending on the error found during
the audit. Colorado recently passed an audit law that almost requires a risk-limiting audit. In this
section we discuss the differences among these state-level schemes.
Five of these States—Alaska, Hawaii, Oregon, Tennessee, and West Virginia—have audit laws that
can escalate to a full count, but they do so using fairly blunt methods:
• Alaska requires counting one randomly selected precinct from each election district within the
state.
22
If the audit finds discrepancy amounting to 1% between the hand count and the prelimi-
nary results, the audit expands to all ballots.
• Hawaii requires an audit of 10% of precincts.
23
If the audit finds any discrepancy, the law requires
election officials to conduct an “expanded audit”; however, the extent of the expanded audit is not
specified.
• Oregon requires a tiered initial audit of the ballots in 3%, 5% or 10% of precincts where the margin
in a given race is greater than 2%, between 1% and 2% or less than 1%, respectively.
24
If the audit
finds discrepancy between the hand count and the preliminary results of 0.5% or more, the count
has to be conducted again. If this level of discrepancy is confirmed by the second count, all ballots
counted by the voting system on which these ballots were cast within the jurisdiction are counted.
• Tennessee requires a hand count of 3% of precincts.
25
If the difference between the hand count
and electronic results is more than 1%, the audit is expanded by an additional 3% of precincts. Un-
fortunately, if the expanded audit still finds error amounting to a 1% difference, the law here only
“authorizes” the election officials to count additional precincts as they “consider appropriate.”
• West Virginia requires a manual count of VVPAT records in 5% of precincts.
26
When the resulting
hand count differs from the electronic results by more than one percent or when it results in a
different outcome, the law requires all VVPAT records to be manually counted.
California, where we performed the audits described in this paper and in other work [11, 21, 6], has
regulations that expand the hand count if enough error is found during the audit. For almost 45 years,
18
See: Massachusetts S.B. 356: http://www.mass.gov/legis/bills/senate/186/st00pdf/st00356.pdf; Massachusetts
H.B. 652: http://www.mass.gov/legis/bills/house/186/ht00pdf/ht00652.pdf.
19
Id. This is the risk-limiting language specific to statewide contests; for congressional races the probability is lowered to
90%.
20
It also tabulates sample sizes, but the table is more detailed.
21
This bill appears to have received no further action after its first reading. See: http://mlis.state.md.us/2009rs/
billfile/HB0665.htm.
22
Id., note 7.
23
Id., note 7.
24
Id., note 7.
25
Id., note 7.
26
Id., note 7.
7
California has had a PEMT that audits a random sample of 1% of precincts.
27
In the wake of studies by
the Secretary of State’s Top-To-Bottom Review [22] and Post-Election Audit Standards Working Group [8],
additional auditing requirements were imposed in 2007 as a condition of recertification for electronic
voting systems. The new rules were challenged in court and the Secretary has since issued the Post-
Election Manual Tally Regulations [3] as emergency regulations. Although the emergency rules are
not risk-limiting, they have the right flavor: They require more auditing for close contests and they
expand the audit—potentially to a full hand count—if the audit uncovers many errors that overstated
the margin.
Jurisdictions in Minnesota must tally votes in 2, 3 or 4 precincts, or 3% of precincts, depending on
the number of registered voters in the jurisdiction.
28
Minnesota law says the audit must escalate by
three precincts if it “reveals a difference greater than one-half of one percent, or greater than two votes
in a precinct where 400 or fewer voters cast ballots.”
29
If this first escalation finds a similar or greater
amount of error in the same jurisdiction, the audit then escalates to encompass all precincts in the
county. As a third and final escalation step, the Secretary of State must order a full recount of any race
where results appear to be incorrect, after these two stages of escalation, if these errors occurred in
counties that compromise more than ten percent of the vote count, in aggregate.
30
These elements of
the Minnesota law reduce risk: If enough error is found during the hand count, the audit can grow to
encompass the entire race, even in races that cross jurisdictional boundaries. However, the resulting
risk still can be quite high, because the law does not take sampling variability into account, because it
requires finding large errors in several precincts in each jurisdiction, and because the sampling fractions
and escalation thresholds are fixed, even for contests with very small margins.
New York’s audit laws require the New York State Board of Elections to promulgate regulations that
determine when to increase the number of voting systems in the audit and when to do a full count of the
audit records for all voting systems.
31
These regulations are currently available for public comment and
review.
32
The proposed regulations require a 3% audit of all voting systems and trigger an expanded
audit of the records from an additional 5% if any vote share changes by 0.1% or if an error occurs in
at least 10% of machines in the initial sample. The audit then expands in a similar manner to include
paper records from and additional 12% and then finally encompasses all machines.
Each of these states has provisions for enlarging audits to a full hand tally, depending on the fre-
quency and location of errors the audit finds. California, New York, and Minnesota tend to reduce
risk—although not to any pre-specified level and not for every contest.
33
Finally, Colorado recently passed legislation that comes close to mandating risk-limiting audits.
HB 1335 requires all counties to conduct what it calls “risk-limiting” audits by 2014, and establishes a
pilot program to develop procedures and regulations.
34
HB 1335 defines “risk-limiting audit” as:
“risk-limiting audit” means an audit protocol that makes use of statistical methods and is
27
Id., note 7. In small races, the law can require auditing substantially more than 1% of precincts because it calls for auditing
at least one precinct in every race. For instance, a 4 precinct race would have at least 1 precinct audited, resulting in at least
a 25% audit. The new California PEMT regulations [3], discussed in the text, call for a 100% manual tally of all ballots cast on
DRE voting systems.
28
Id., note 7. Jurisdictions with more than “100,000 registered voters must conduct a review of a total of at least four
precincts, or three percent of the total number of precincts in the county, whichever is greater.” (Minn. Stat. § 206.89(2)).
29
Minn. Stat. 206.89(a) (2008).
30
Minn. Stat. 206.89(b) (2008).
31
Id., note 7.
32
See: “Proposed Amendment to Subtitle V of Title 9 of the Official Compilation of Codes, Rules and Regulations of the State
of New York Repealing Part 6210.18 and Adding thereto a new Part, to be Part 6210.18 Three-Percent (3%) Audit”, New York
State Board of Elections, 29 May 2009, http://www.elections.state.ny.us/NYSBOE/Law/6210.18Regulations.pdf.
33
While these provisions tend to reduce risk, they are not risk-limiting: California’s regulation only triggers increased auditing
when the margin of victory is less than 0.5%. Contests with larger margins of victory are not subject to auditing beyond the
standard 1% PEMT audit, no matter how much error the 1% audit finds. Minnesota’s law only audits races for U.S. President
(or the Minnesota Governor), U.S. Senator and U.S. Representative. No other contests on the ballot are subject to the audit.
New York’s proposed regulation does not coordinate audits across jurisdictional boundaries for contests that span multiple
counties to limit the risk of certifying an incorrect outcome. New York does not require escalation to a full count across all
types of voting technology used to cast ballots in a contest, but instead confines escalation to the specific voting technology in
which errors are observed.
34
HB 09-1335, “Concerning Requirements for Voting Equipment”, See: http://www.leg.state.co.us/Clics/CLICS2009A/
csl.nsf/fsbillcont3/25074590521F41DA87257575005F1422?Open&file=1335_enr.pdf. HB 1335 was signed into law by
Colorado Governor Ritter on 15 May 2009 (see: [14]).
8
designed to limit to acceptable levels the risk of certifying a preliminary election outcome
that constitutes an incorrect outcome.
35
This language comes closer to limiting the risk of certifying an incorrect outcome than do the proposals
discussed in the previous section.
However, it has problems. The phrase “statistical methods” serves to obfuscate, not clarify; “risk” is
not defined, and the definition of “incorrect outcome” given in the statute has a loophole:
“incorrect outcome” means an outcome that is inconsistent with the election outcome that
would be obtained by conducting a full recount.
36
“Full recount” might allow machine re-tabulation in lieu of a full hand count of voter-verified ballot
records—a more appropriate standard for determining the “correct” electoral outcome. Hence, a better
legislative definition of “risk-limiting audit” is:
“risk-limiting audit” means an audit protocol that has an acceptably high probability of re-
quiring a full manual count whenever the electoral outcome of a full manual count would
differ from the preliminary election outcome. When the audit results in a full manual count,
the outcome of that count shall be reported as the official outcome of the contest.
That would be consistent with the consensus definition of “risk-limiting audit,” and still leave room for
legislators or elections officials to decide what “acceptably high” means.
2.2.3 Federal Legislation
Representative Rush Holt’s “Voter Confidence and Increased Accessibility Act” (H.R. 2894) is the leading
federal election reform bill to include PEMT audits.
37
Like Oregon’s legislation,
38
the Holt bill has a tiered, margin-dependent sample size of 3%, 5% or 10%
of precincts when the margin in federal races is greater than 2%, between 1% and 2% or smaller than
1%, respectively. The bill allows escalation—but does not require it—if errors are discovered during the
audit. Because the audit need not progress to a full hand count even when large errors are found, the
Holt bill does not limit risk.
The Holt bill has a clause that allows the National Institute of Standards and Technology (NIST) to
approve an alternative audit plan, provided NIST determines that:
(A) the alternative mechanism will be at least as statistically effective in ensuring the accu-
racy of the election results as the procedures under this subtitle; or
(B) the reported election outcome will have at least a 95 percent chance of being consistent
with the election outcome that would be obtained by a full recount.
39
This language has problems. The Holt bill never requires a full hand count, so it cannot ensure the
accuracy of election results. In particular, there is no sense in which it is “statistically effective in
ensuring the accuracy of election results.” It would seem that to approve an alternative under (A), NIST
must concede that the Holt bill is not statistically effective.
Clause (B) looks more like a risk-limiting audit provision, but it is garbled to a statistician’s eye.
Absent another definition, we assume that “reported election outcome” means “apparent election out-
come.” The apparent outcome either is or is not the outcome a full recount would show. There is no
probability about it. The probability is only in the audit sample. So, clause (B) does not make sense.
Moreover, requiring “consistency” between the apparent outcome and what a full recount would
show seems too weak: It appears to permit an apparent outcome to be altered without a full hand
count. If so, there is a possibility that a correct outcome will be turned into an incorrect outcome based
35
Id., note 34.
36
Id., note 34.
37
H.R. 2894, “The Voter Confidence and Increased Accessibility Act”, 111th U.S. Congress (2009), http://thomas.loc.gov/
cgi-bin/bdquery/z?d111:h2894: (accessed Jun 18, 2009).
38
Id., note 7.
39
Id., note 37.
9
on statistical evidence. That seems like it should be unacceptable. These problems could be avoided by
using the consensus definition of a risk-limiting audit: The alternative mechanism should have at least
a 95% chance of requiring a full hand count whenever that hand count would show that the apparent
outcome was wrong.
We hope that if the Holt bill passes, the NIST clause will be interpreted to allow risk-limiting audits.
Unfortunately, it is not clear that audits that satisfy the Holt provisions can be risk-limiting.
2.2.4 Boulder County, CO Audit, November 2008
For the November 2008 General Election in Boulder County, Colorado, the Boulder County Elections
Division was assisted by McBurnett in performing what he called a “risk-limiting” audit [10]. However, it
is not risk-limiting according to the consensus definition.
40
It was designed in the “detection” paradigm,
not the “risk-limiting” paradigm.
Under the assumption that WPM of 20% holds (an assumption we find unconvincing), the Boulder
County audit had a large chance of finding one or more errors if the outcome were wrong—in local
races, since errors in other counties were invisible to the audit. The number of batches to be audited
for local races was capped at 10, so the chance of finding at least one error if the outcome was wrong
differed from local contest to local contest, depending on the margin, among other things. The 10-batch
limit was imposed so that auditing a close, small contest would not require hand counting the votes of
every batch of ballots in the race.
41
The Boulder audit did not have escalation rules—provisions for what to do if error was found. Hence,
it did not ensure any chance of a full hand count if the apparent outcome was wrong. The audit was
constructed so if the outcome were wrong, there was a large chance of finding at least one error. The
audit did find error in some contests. Given the design, to be risk-limiting the audit had to escalate to a
complete hand count of every race in which the initial sample found one or more errors, even assuming
WPM of 20% held.
3 Risk-Limiting Audits in California
We performed four risk-limiting audits in California in 2008: two in Marin County and one each in Yolo
and Santa Cruz Counties. This section describes the audits and the differences among them. Table 1
reports summary statistics for the audits. These audits are, to the best of our knowledge, the first and
only risk-limiting post-election audits, according to the consensus definition discussed in Section 2.1.
The four audits explored different sampling methods, different statistical tests, and a variety of
administrative protocols to increase efficiency. They had a 75% chance of leading to a full hand count,
thereby correcting an erroneous apparent outcome, if the apparent electoral outcome happened to be
wrong—no matter what caused the errors that led to the incorrect outcome. That is, these audits
limited the risk that an incorrect outcome would go uncorrected to at most 25%. We could have limited
the risk to a lower level, at the cost of more hand counting. Because the primary goal of these audits
was to gain experience, compare methods, and to understand (and reduce) the logistical complexity of
administering risk-limiting audits, we felt that a risk limit of 25% was appropriate.
3.1 Marin County, Measure A, February 2008
The first post-election risk-limiting audit ever performed was conducted by our group in Marin County
in February of 2008 for Marin’s Kentfield School District Measure A. This ballot measure, passed by a
2/3 majority of voters, raised property taxes in the Kentfield school district to support public education.
Voters in 9 precincts were eligible to vote on Measure A and 5,877 valid ballots were cast (280
showed undervotes and overvotes). The initial vote count showed 4,216 votes (71.7% of ballots) in favor
and 1,661 votes (27.0% of ballots) against, with a margin of 298 votes (5.1% of ballots) above the 2/3
40
See: Section 2.1.
41
In personal communication, McBurnett describes this as having had a “fixed audit budget” and that they chose to allocate
that budget more towards larger contests.
10
County Total Winner Loser Margin Precincts Batches Batches # Ballots % Ballots
Ballots Audited Audited Audited
Marin (A) 6,157 4,216 1,661 5.1% 9 18 12 4,336 74%
Yolo (W) 36,418 25,297 8,118 51.4% 57 114 6 2,585 7%
Marin (B) 121,295 61,839 42,047 19.1% 189 544 14 3,347 3%
Santa Cruz 26,655 12,103 9,946 9.6% 76 152 16 7,105 27%
Table 1: Summary of the four races audited. Ballots and votes for the candidates are the results as
reported when the audit commenced; they may differ from the official final results for the contests.
Margins are expressed as a percentage of the votes for all candidates. Marin Measure A required a
2/3 supermajority to pass (the margin is calculated accordingly from 5,877 total valid ballots). Yolo
County Measure W required a simple majority. Marin Measure B required a simple majority. These three
measures passed. John Leopold and Betty Danner were the main contenders for Santa Cruz County
Supervisor, 1st District; there were 103 votes in all for write-in candidates. Leopold won.
majority of votes required for the measure to pass. Table 2 summarizes the results for the Measure A
contest.
3.1.1 Test & Sample Size
For this audit, error was measured as the overstatement of the margin, in votes. The method of [18]
allows one to use a weight function to accommodate factors such as an expected level of discrepancy
or variations in batch size. We used the following weight function:
42
w
p
(x) =
(x −4)
+
b
p
, (1)
where x is the overstatement of the margin in votes in batch p and b
p
is the total number of valid
ballots cast in batch p. This weight function ignores overstatements of up to 4 votes per batch. The risk
calculation takes that allowance into account.
We set aside the smallest batch in a stratum of its own.
43
We used rolls of a 10-sided die to draw
a simple random sample of 6 of the 8 batches of ballots cast in polling places to audit shortly after
election day. Once the vote-by-mail (VBM) ballots had been tabulated, we used rolls of a 10-sided die to
draw an independent simple random sample of 6 of the 8 batches of VBM ballots to audit. We postponed
deciding whether to sample provisional ballots until we could determine whether they could possibly
change the outcome, given the results of the audits of the polling-place and VBM ballots. We thus
had four strata containing a total of 18 batches: batches of ballots cast in polling places (by precinct),
batches of VBM ballots (by precinct) except for the smallest precinct, the smallest VBM precinct by itself,
and provisional ballots. By stratifying in this way we could start auditing polling-place results almost
immediately, even though VBM results were not available until a couple of weeks after election day, and
provisional results not until the end of the canvass period. Our protocol required a full hand count
if we were unable to confirm the preliminary results at our specified level of risk in the first round of
sampling. Table 3 shows the timetable for the audit.
3.1.2 Risk Calculation
As shown in Table 2, error in the provisional ballots could have overstated the margin by up to 191 votes,
and error in the excluded precinct, precinct 2010, could have overstated the margin by up to 4 votes.
44
42
The notation ( )
+
means zero or the quantity in parentheses, whichever is larger.
43
The excluded batch, precinct 2010, was a VBM-only batch in a precinct of 6 registered voters. We treated it as if it attained
its maximum possible error, to ensure that the audit was conservative.
44
To calculate the upper bounds in Table 2, we assume that all invalid ballots and “yes” votes might really have been “no”
votes counted incorrectly, overstating the margin. Counting a “no” vote as a “yes” vote overstates the true margin by 1 vote.
In contrast, counting a “no” as an invalid ballot or undervote overstates the margin by 2/3 of a vote (since it subtracts a vote
from both the numerator and the denominator of the margin calculation). The upper bound on the amount by which error in
11
Batch ID b
p
Bound Yes No Audited
2001-IP 391 286 278 101 yes
2001-VBM 657 456 438 193 no
2004-IP 284 214 204 66 yes
2004-VBM 389 268
257 116 yes
2010-VBM 6 4 4 2 no
2012-IP 218 173 167 43 yes
2012-VBM 342 250
242 89 no
2014-IP 299 221 214 75 no
2014-VBM 420 319 306 95 yes
2015-IP 217 171 167 44 yes
2015-VBM 483 346 332 131 yes
2019-IP 295 222 215 70 yes
2019-VBM 567 403 395 160 yes
2101-IP 265 181 169 79 no
2101-VBM 439 296 275 133 yes
2102-IP 223 152 144 68 yes
2102-VBM 410 257 233 142 yes
ALL-PRO 252 191 176 54 no
Table 2: Results and error bounds for Marin Measure A, February 2008. A stratified random sample
of 12 batches was selected by rolling 10-sided dice. Batch ID is the precinct number followed by the
manner in which those ballots were cast (“VBM” are vote-by-mail ballots, “IP” are ballots cast in the
polling place and “PRO” are provisional ballots). b
p
is the total reported ballots in that batch and Bound
is the upper bound on the discrepancy in the count for this group of ballots (see note 44). Yes and
No are the total reported votes for each selection and Audited indicates whether the set of ballots was
selected for the audit.
At most, errors in these ballots could have inflated the apparent margin over the true margin by
195 votes. Unfortunately, any one of the other 16 batches—the 8 batches of polling-place votes and
8 batches of vote-by-mail votes—could have held enough error to account for the 103 vote “reduced
margin.” Thus, only one batch among the 16 would have to have a margin overstatement of more than
4 votes for the total overstatement in all 16 to possibly exceed 103 votes.
If only one of the batches had an overstatement of more than 4 votes, then at least one of the
polling-place counts or at least one of the vote-by-mail counts had an overstatement more than 4 votes,
or both. If precisely one of the polling-place batches overstated the margin by more than 4 votes, a
random sample of 6 of 8 batches would have missed it with probability
45

7
6


8
6

= 25%. (2)
By the same reasoning, if there were only one VBM batch with an overstatement error of more than
4 votes, a sample of 6 of 8 batches would have probability 25% of missing it. Since the chance of finding
a single bad batch is at least 75% regardless of which stratum it was in, there is at least a 75% chance
overall that the sample would contain the bad batch if there were only one bad batch in all. In other
words, if exactly one of the 16 batches from which the sample was drawn overstated the margin by
more than 4 votes, the chance the stratified sample of 12 batches would have missed it is 25%.
Having only one bad batch is a hypothetical worst-case. If two or more batches overstated the
margin by more than 4 votes, the chance that the sample would have missed all of them is considerably
the provisional ballots could have overstated the margin is thus the number of “yes” votes plus 2/3 of the number of invalid
ballots: 176 + (2/3) · (252 − 176 − 54) = 190.67  191. (For an extended discussion of how changing vote totals can affect
election margins, see: http://josephhall.org/eamath/margins09.pdf.)
45
The notation

x
y

is shorthand for the binomial coefficient
x!
y!(x−y )!
.
12
Milestone Date
Election day 5 February
Polling place results available 7 February
Random selection of polling place precincts 14 February
VBM results available
20 February
Random selection of VBM precincts 20 February
Hand tally complete 20 February
Provisional ballot results available
29 February
Computations complete 3 March
Table 3: Timeline for audit of Marin Measure A, February 2008.
less than 25%. The worst-case calculation guarantees that the risk is no greater than 25%: For all other
hypothetical situations in which the outcome is wrong, the risk is lower. None of the manual tally results
had a discrepancy of more than 4 votes. Hence, the audit limited the risk to at most 25%, without a full
hand count.
The margin in this race is relatively small, about 4.8% of the ballots cast, including undervoted and
invalid ballots. The percentage of undervotes was about 4.5%, larger than the margin. And the race was
small: only 9 precincts, of which one had only 6 registered voters. These features made it necessary to
audit a much higher percentage of ballots than would have been necessary had the margin been larger,
had the race been larger, or had there been fewer undervotes. The audit effort for this race is not typical:
It is virtually a worst-case scenario. The other three pilot audits described below are of larger contests;
the required sampling fractions are correspondingly smaller.
The total cost of the manual tally was $1,501, including the salaries and benefits of four people
tallying the count, a supervisor, and the support staff needed to print reports, resolve discrepancies,
transport the ballots and locate and retrieve VBM ballots from the batches in which they were counted.
This amounts to $0.35 per ballot audited. The tally took 1
3
4
days of counting to complete.
These figures do not include the statistician’s time, most of which was spent re-keying elections
results from election management systems (EMS) reports into machine-readable form. That was not
terribly burdensome because the contest was so small. Performing the risk calculations took very little
time.
3.2 Yolo County, Measure W, November 2008
The second audit we performed was of Measure W in Yolo County, California. Measure W raised property
taxes for the Davis Joint Unified School District. The 36,418 ballots cast contained 33,415 valid votes.
In all, 25,297 Yes votes (69.5% of ballots) were cast, 8,118 No votes (22.3% of ballots) were cast, and
3,003 ballots (8.2% of ballots) showed undervotes or overvotes. The measure passed with a margin of
17,179 votes (51.4% of valid ballots).
The race included 57 precincts. For each precinct, VBM ballots were tabulated and reported sep-
arately from IP ballots, giving 114 auditable batches. The distribution of total votes per batch was
centered at around 300–400 votes, with a handful of precincts with 100 or fewer votes. Table 4 reports
details of the batches selected for audit.
3.2.1 Test & Sample Size
Like the audit of Marin County Measure A described in the previous section, this audit used a stratified
random sample. However, it used the maximum relative overstatement of pairwise margins (MRO) of [17]
instead of the margin overstatement.
46
The MRO divides the overstatement of the margin between each apparent winner and each apparent
loser by the reported margin between them. In contests with more than two contestants, the MRO leads
46
In contrast, the audits in Santa Cruz and Marin counties described below used unstratified sampling with probability
proportional to size.
13
Batch ID b
p
Bound Yes No
100037-IP 396 594 285 87
100039-VBM 435 690 337 82
100051-IP 443 600 280 123
100056-IP 284 437
209 56
100060-IP 671 1001 483 153
100063-VBM 356 548 257 65
Table 4: Summary of audit sample for Yolo’s Measure W, November 2008. Six batches were selected
using a random selection from the 57 decks of vote-by-mail ballots and 57 batches of ballots cast in
polling places. “VBM” denotes vote-by-mail ballots and “IP” denotes ballots cast in polling places. b
p
is
the total reported ballots in that batch and Bound is the upper bound on the discrepancy in the count
for this group of ballots. Yes and No are the votes for and against the measure. The audit found one
error that increased the margin and one that decreased it.
to a sharper test than the raw margin overstatement used in the Marin Measure A audit, because it
normalizes errors onto a scale of zero to 100% of the margin: An overstatement of one vote of a large
margin (e.g., between the winner and fourth place) casts less doubt on the outcome of a contest than an
overstatement of one vote of a small margin (e.g., between the winner and the runner-up). In the Yolo
election this does not matter because there were only two candidates, “yes” and “no.”
As in the audit of Marin Measure A, batches consisted of votes for one precinct cast in one way—in
the polling place or by mail—but provisional ballots were counted along with the polling-place ballots
for each precinct. Moreover, we stratified the batches differently: Batches with very small error bounds
were grouped into one stratum. The remaining batches comprised a second stratum. The first stratum
was not sampled. Instead, batches in that stratum were treated as if they were attained their maximum
possible error. A simple random sample was drawn from the second stratum. We did not draw the
sample until preliminary vote counts had been reported for all batches in the county and provisional
ballots had been resolved.
As described in [20], batches can be exempted from sampling if they are treated as if they attained
their maximum possible error. This can improve efficiency if the worst-case error in those batches is
much smaller than in other batches. That can happen if the batches contain relatively few ballots, as is
typical in rural precincts. When such batches are set aside, a sample of a given size from the remaining
batches has a higher chance of containing a precinct that holds a large error, if there is enough error in
the aggregate to alter the apparent outcome of the race.
In the Yolo county audit, we grouped 11 small batches that could contain no more than 5 overstate-
ment errors each into one stratum and treated them as if they attained their worst-case error: a total of
0.04% of the margin.
47
The remaining 103 batches comprised the second stratum. By sampling from the
second stratum alone, a sample of a given size had a larger chance of finding at least one batch with a
large overstatement error—if there was enough error in all to cause the apparent outcome to differ from
the outcome a full hand count would show. There is a trade-off: Grouping more small batches into the
unsampled stratum entails assuming that there is more error, since those batches must be treated as if
they have their maximum possible error, to keep the analysis conservative. That means that less error
can be tolerated in the remaining batches, which reduces the error threshold that leads to expanding the
audit. However, it increases the sampling fraction among the remaining batches, increasing the chance
that the sample will contain a batch with large errors—if any such batch exists—which tends to reduce
the initial sample size.
47
For the Yolo audit, we supposed that errors of up to 5 votes per batch might occur even when the outcome is correct; the
Yolo County election staff had not seen an error of more than 1 or 2 votes in quite some time. That led to grouping batches
with error bounds of 5 votes or less into the unsampled stratum. The resulting expected sample size was 2,121 ballots. The
actual sample size turned out to be 2,585 ballots.
14

Không có nhận xét nào:

Đăng nhận xét