Other web Sites
Harmonica Blues  Harmonica Amps
Harmonica Links Harmonica Pages
Archives Home
Years
 · 1992
 · 1993
 · 1994
 · 1995
 · 1996
 · 1997
 · 1998
 · 1999
 · 2000
 · 2001
 · 2002
 · 2003
 
Web HarpL
Ebay Searches:
Amps:
Microphones:
Effects:
Harmonicas and Gear:
Harmonica Music and Instruction:

 

 

Harp-L Archives

[Previous Message] [Next Message]

[Start of Thread] [End of Thread]

From: John Thaden
Date: Tue, 13 Apr 1999 14:36:46 -0500
Subject: Materials Test: Reanalysis of SPAH '97 Data [LONG]

Effects of Harmonica Comb Material on Harmonica Sound: A Monte Carlo Analysis

John J. Thaden

Department of Geriatrics, University of Arkansas, 4300 W. 7th St. RSCH-151,
Little Rock Arkansas USA 72205. jjthad~lash.net

Instrument choice can be perplexing for harmonica players. One
variable is comb composition. Available comb materials include pear wood,
various plastics, and metals such as aluminum, titanium, and brass. Do
comb materials have detectable effects on the sound of a harmonica? The
null hypothesis--that comb materials do not affect sound--was tested at the
1997 Society for the Preservation and Appreciation of Harmonica (SPAH)
convention by asking a group of harmonica players/enthusiasts to identify
comb material by hearing the instruments played.
Test instruments were diatonic harmonicas, all with unmodified Hohner
Big River reedplates and covers, but with combs fabricated of different
materials. The harmonicas were played in three separate test series.
During each, four harmonicas were first identified and played, then played
in random order without identification. In the first two test series, the
same brief test melody (the first and last phrases of 'Summertime') was
played eight times by British professional John Walden; in the third
series, a vacuum pump actuated the reeds for five tests. Thus, there were
8 + 8 + 5 = 21 tests total. A fourth series tested chromatic harmonica
covers; it will not be discussed.
The 1997 conclusion was that participants' harmonica identification
scores did not differ significantly from scores that might result after
random guessing, and thus, that the null hypothesis (no materials effect on
sound) cannot be rejected with acceptable confidence (95%). In other
words, if the whole experiment could be repeated 100 times, and if indeed
comb materials make no difference, then test scores as good or better than
that achieved by the SPAH 1997 audience would be likely to occur by chance
alone more than five times in those 100 experiments.
An active and prolonged debate ensued on Harp-L. Some faulted the test
design and/or execution. On February 18, 1999, the raw data were published
on Harp-L [Vern Smith, SPAH97 Materials Test Raw Data], excluding a few
listeners who "turned in test papers on which only a few or no selections
were made".
This letter describes a new analysis of the SPAH'97 data that has
addressed a problem in the original analysis caused by missing guesses, and
has used resampling statistical methods to empirically discover the
expected distribution of scores assuming no effect of comb material on
harmonica sound, rather than merely assuming it is normal. The conclusions
differ from the 1997 conclusion.

RESULTS (Note: Use a nonproportional font to view tables)

o If missing data are scored the same as incorrect selections, mean and
median scores are lower, but so are the mean and median scores one expects
by chance.

Some listeners left blanks indicating to me that
they could not distinguish among the harps played.
I scored that the same as an incorrect selection.
[Smith, op. cit.]

Table 1 shows the extent and pattern of omitted responses, runs of
four or more omitted responses, and duplicated guesses, made by 27
respondents after each of 21 harmonica test sounds.

TABLE 1. Data structure: omissions and double-guesses by respondent
Individ # 1,2,4,11
13,15,17,18 3 5 6 7 8 9 10 12 14 16 19 23 24 27
20-22,25,26
Number blank: 0 21 5 8 0 0 5 1 6 0 6 0 1 0 0
Blanks in runs >3: 0 21 6 5
Doubled entries: 0 0 2 1 3 3 0 2 0 2 1 1 1 1 6

Listener #3 left all answers blank (but did participate in the final,
cover-materials test not discussed here). It is not clear if these 21
omissions were scored as wrong in the 1997 analysis. Respondent #9 answered
all queries in the first two test series, but none in the third series or
the fourth (cover-materials) series. Listener #6 omitted the final six
guesses of the second test series. Instead of inability to distinguish
harmonicas, these omissions may have been due to the individuals' absence
from the room, or simple lack of interest. The remaining omissions were
scattered, none occurring in runs of four or more. Nonetheless, one cannot
rule out explanations unrelated to ability to distinguish harmonicas, for
instance, a broken pencil, or an intrusive background noise. To discover
the expected score distribution assuming no materials effect, a Monte Carlo
simulation was done with 10,000 replications, each structured like the
actual experiment (21 tests x 26 respondents, 32 omissions, 23
double-guesses).

Table 2 shows summary statistics based on the analysis method of 1997,
excluding listener #3. Omissions were scored as wrong (0), correct doubled
guesses as half-right (0.5), and correct single guesses as right(1). The
scores are percentages because the sum has been divided by the total number
of tests (21) and multiplied by 100%. The same scoring method was used for
10,000 replications in a Monte Carlo experiment, each reproducing the
structure of the actual experiment (including omissions and doubled
guesses), but with all guesses being strictly random choices of one of four
possibilities, i.e., assuming no materials effects. Table 2 also shows the
exact probability (P) that the actual mean, median, and maximum scores came
from distributions revealed by the Monte Carlo experiment. A P-value less
than 0.05 generally is considered to indicate a significant difference
between actual and expected values, and thus that respondents' efforts to
identify these harmonicas were aided by differences in how they sound.

Table 2. Omissions scored as wrong: summary of data from 1997 SPAH and a
Monte Carlo experiment
SPAH'97 Monte Carlo P
Mean 27.75 23.52 +/- 2.91 0.0104
Median 28.57 23.10 +/- 4.17 0.0363
Maximum 57.14 43.59 +/- 8.33 0.0278

The Monte Carlo experiment showed that the expected mean and median are not
25%, as one might think based on the fact that each test involved four
possible harmonicas, only one of which could be the correct answer, bur
23.5 and 23.1% respectively. This difference is real, and can be
attributed to the scoring method--that omitted answers are never right
while purely random guesses will be right 25% of the time. Because the
mean and median distributions are shifted downward, the actual mean and
median scores at SPAH'97--though little different from 25%, are
significantly different than the prediction of chance.

The method for handling missing data chosen in the present analysis
was to score them no better than expected by chance (0.25 points). Table 3
shows the SPAH'97 results scored in this manner (again omitting listener
#3), and also the empirical results of a Monte Carlo experiment.

Table 3. Omissions scored as random guesses: summary of data from 1997 SPAH
and a Monte Carlo experiment
SPAH'97 Monte Carlo P
Mean 29.17 25.00 +/- 2.88 0.0122
Median 30.36 24.62 +/- 3.87 0.0055
Maximum 57.14 44.00 +/- 7.74 0.0253

Regardless of the method for handling missing data, the mean, median and
maximum scores from the SPAH'97 all exceed expectations if guessing were
random, provided the random score distributions are expressed exactly via a
resampling technique like the Monte Carlo experiments.

o Double guesses were appropriately treated in the 1997 analysis:

We allowed the listeners to name two comb materials
if they chose. In that case the probability of
guessing correctly rises [from 25%] to 50% so they
should be scored 1/2 correct if they hit on two
selections. [Smith, op. cit.]

Table 1 also shows the distribution of double entries among respondents.
Duplicate guesses may be considered 'sampling without replacement' from
four comb materials, and indeed, the probability of making a correct
response with two random guesses is 0.5. Scores therefore included 1/2
point for a pair of guesses where one was correct (Table 2). Alternative
analyses were also done wherein double guesses were treated essentially as
single guesses on two separate tests and missing data were handled as
described for Tables 2 and 3; the results were similar and led to identical
conclusions [data not shown].

o Mean, median, and maximum scores at SPAH 1997 exceeded the predictions
of chance.

Tables 2 and 3 both illustrate significant ability in the group of
respondents to outperform mere chance guessing when trying to identify
harmonicas by sound.

CONCLUSION:

Something about the different harmonicas allowed respondents to
identify which of four harmonicas was being played. The ability to do so
was not strong, but influenced the mean, median and maximum scores enough
that they deviated significantly from the mean, median and maximum values
that would be obtained by random guessing.

Were respondents actually hearing differences in harmonica sounds
caused by differences in comb material? The test definitely fails to prove
this. The problem that disallows this conclusion is that each harmonica
also had different reed plates and cover plates. Though these components
were all from the same model harmonica (Hohner Big River), it is common
knowledge among harmonica players that at least reedplates (and the
attached reeds) can differ markedly for different harmonicas of the same
model. No attempt was made to normalize the reeds' musical pitches,
timbre, rest position with respect to the reedplates (offset),
responsiveness, etc.

John Thaden
Little Rock, Arkansas, USA