Accuracy vs. melodic similarity • musicassessr

In the musicassessr framework, we generally reason for employing measures of melodic similarity for scoring melodic recall data (as argued in our Music Perception paper). It is worth noting that using measures of melodic similarity on such data is quite different to using measures of accuracy. In this vignette, we intuitively demonstrate the difference between accuracy and melodic similarity approaches to scoring melodic recall data.

1. Foundations of accuracy-style measures

First, let us define accuracy in the context of melodic recall research. In this context, broadly speaking, accuracy can be defined as the number of “correct” notes relative to something else, where “correct” usually means “contained in the target stimulus”. The “something else” could be i) the number of unique notes in the target stimulus or ii) the attempt length.

Let us call a note that is both contained in the target melody and the sung recall a “hit”, a note recalled that is not in the target stimulus, a “false alarm” and a note in the target melody but not the sung recall, a “miss”.

Table: (#tab:unnamed-chunk-3)Confusion matrix for note-by-note comparison of target melody to sung recall.

	Sung Recall Melody Note
	Present	Absent
Target Melody Note
Present	Hit	Miss
Absent	False Alarm	Correct Rejection

Accuracy can be defined as:

\[\begin{equation} Accuracy = \frac{NoHits}{NoHits + NoFalseAlarms + NoMisses} \tag{1} \end{equation}\]

Another measure related to accuracy is precision, defined as:

\[\begin{equation} Precision = \frac{NoHits}{NoHits + NoMiss} \tag{2} \end{equation}\]

which means the denominator will be the number of notes in the stimulus and is, hence, the proportion of notes in the target stimulus recalled.

Lastly, recall is defined as:

\[\begin{equation} Recall = \frac{NoHits}{NoHits + NoFalseAlarms} \tag{3} \end{equation}\]

where the denominator represents the number of notes in the sung recall.

Precision and recall can be combined together into a score known as the F₁ score, which is the harmonic mean of precision and recall, simplifying to:

\[\begin{equation} F_1 = \frac{2 * NoHits}{2 * NoHits + NoMisses + NoFalseAlarms} \tag{4} \end{equation}\]

The F₁ score as well as precision and recall have been used extensively in computational contexts and music information retrieval contexts [@pearceMelodicGroupingMusic2010]. One issue in melodic recall research is that the length of a recall can be substantially different from that of a target melody length, in terms of number of notes. One way of dealing with this, in the context of accuracy measurement, is to normalise based on the length of the stimuli:

\[\begin{equation} AccuracyAdjusted = Accuracy * LogNormal(\frac{NoNotesInSungRecall}{NoNotesInStimuli}) \quad \in [0,1] \tag{5} \end{equation}\]

where Accuracy is computed via Eq. 1 and LogNormal represents a log normal transformation function to map the accuracy value into the closed unit interval range.

We profile these measures of accuracy, Accuracy, Recall, Precision, AccuracyAdjusted and F₁, on melodic recall data here.

2. Example comparisons of accuracy vs. similarity measures on the same data

We now score some simple examples to demonstrate the difference between accuracy and similarity scored on the same data. Let us take the first five notes of the C major scale as our target. If the target and recall are the same, all measures of accuracy and similarity will be 1. However, as Example 1 shows, if the pitch order is reversed, the accuracy measures will be 1, but the overall measure of similarity (opti3), nearly half this. This corresponds to the fact that the notes are the same, but the order highly distorts the melodic identity (note that the harmonic and rhythmic identity is preserved, however).

Table 2: Reversed recall similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
1	1	1	1	1	1	0.27	0	1	0

Conversely, as Example #2 shows, transposing the recall by a semitone causes a large deterioration in the accuracy scores, since nearly all notes are different, but the similarity measure is 1, corresponding to the fact that the melodic identity is preserved under the principle of transposition invariance.

Table 3: Transposed recall similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
2	0.11	0.2	0.2	0.2	0.2	0.78	1	1	0

For Examples 3-6, we take the humourous example of the “The Lick” - the archetypal jazz pattern made famous by YouTube and transform it by the familiar melodic transformations of retrograde, inversion and retrograde inversion. Example #3 shows that the retrograde preserves accuracy but destroys identity and similarity.

Table 4: The Lick vs. its retrograde (rhythms the same) similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
3	1	1	1	1	1	0.39	0	1	0.5

Examples #4 and #5 show that the inversion and retrograde inversion somewhat preserves accuracy but destroys identity and similarity.

Table 5: The Lick plus its inversion (rhythms the same) similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
4	0.56	0.71	0.71	0.71	0.71	0.27	0	1	0

Table 6: The Lick plus its retrograde inversion (rhythms the same) similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
5	0.56	0.71	0.71	0.71	0.71	0.4	0.25	1	0

Finally, consider no pitch value changes, but only rhythmic changes, which destroys rhythmic identity and lowers overall melodic similarity (opti3), but preserves accuracy.

Table 7: The Lick plus its rhythm values changed similarity results
Example No.	Acc.	Prec.	Recall	F1	AccAdj	opti3	ngrukkon	rhythfuzz	harmcore
6	1	1	1	1	1	0.54	1	0.14	0.5

The examples should intuitively demonstrate how accuracy and similarity measures diverge and specifically, that similarity metrics embody notions of perceptual similarity across relevant musical dimensions, as opposed to note-by-note accuracy measures, which do not necessarily embody anything musical into their consideration. In other words: accuracy measures may score a target and its recall very highly, despite the musical differences being very large, and the opposite, a very musically similar target and recall very badly! These observations are not new. Perceptual experiments have demonstrated that the familiar transformations of retrograde, inversion and retrograde inversion destroy melodic identity (e.g., Dowling, 1972). However, we are not aware of such a distinction being made clearly within melodic recall research. To profile the difference more quantitatively, look at our simulation study. In Silas & Müllensiefen (2023), we profiled some more sophisticated variations of accuracy measurements and compared them to melodic similarity measures, which we favour in our test environment; though we note, all the accuracy measures described above are available in our framework.

References

Dowling, W. J. (1972). Recognition of melodic transformations: Inversion, retrograde, and retrograde inversion. Perception & Psychophysics, 12(5), 417–421. https://doi.org/10.3758/BF03205852

Silas, S., & Müllensiefen, D. (2023). Learning and recalling melodies: A computational investigation using the melodic recall paradigm. Music Perception.

3. Code used above

library(dplyr)
library(knitr)
library(gm)

# Note you will need MuseScore installed on your system for gm to work.

sim_vs_acc <- function(target, recall, 
                       target_rhythms = rep(0.5, length(target)), 
                       recall_rhythms = rep(0.5, length(recall)),
                       example_no,
                       example_description) {
  
  h <- musicassessr::score_melodic_production(
    user_melody_input = recall,
    user_duration_input = recall_rhythms,
    user_onset_input = c(0, cumsum(recall_rhythms)[1:(length(recall_rhythms)-1)]),
    stimuli = target,
    stimuli_durations = target_rhythms,
  )
  
  
  tibble::tibble(
    `Example No.` = example_no,
    `Example Desc.` = example_description,
    `Acc.` = h$accuracy,
    Prec. = h$precision,
    Recall = h$recall,
    F1 = h$F1_score,
    PMI = h$PMI,
    AccAdj= h$proportion_of_correct_note_events_controlled_by_stimuli_length_log_normal,
    opti3 = h$opti3,
    ngrukkon = h$ngrukkon,
    rhythfuzz = h$rhythfuzz,
    harmcore = h$harmcore
  ) %>% dplyr::mutate(dplyr::across(`Acc.`:harmcore, round, 2))
}


create_sheet <- function(target_v, 
                         recall_v, 
                         target_rhythms_v = rep("quarter", length(target_v)),
                         recall_rhythms_v = rep("quarter", length(recall_v)),
                         key = NULL,
                         target_title = "Target",
                         recall_title = "Recall",
                         export = TRUE,
                         just_return_name = FALSE) {
  l <- Line(
  pitches = as.list(target_v),
  durations = as.list(target_rhythms_v),
  name = target_title
  )
  
  l2 <- Line(
  pitches = as.list(recall_v),
  durations = as.list(recall_rhythms_v),
  name = recall_title
  )
  
  
  if(!is.null(key)) {
    m <- Music() + Meter(4, 4)+ Key(key) + l + l2
  } else {
    m <- Music() + Meter(4, 4) + l + l2
  }
  
  if(export) {
    fn <- paste0(target_title, "_", recall_title)
    if(!just_return_name) gm::export(m, getwd(), fn, "png")
    return(paste0(fn, ".png"))
  } else {
    gm::show(m) 
  }
}

invert_melody <- function(m) {
  start_note <- m[1]
  intervals <- diff(m)
  inverted_intervals <- intervals * -1
  itembankr::rel_to_abs_mel(inverted_intervals, start_note)
}

# Example #1: Reversed Recall

ex1_t <- c(60, 62, 64, 65, 67)
ex1_r <- ex1_t[length(ex1_t):1]

ex1_rhythms <- c(0.25, 0.25, 0.25, 0.25, 0.5)
ex1_rhythms_text <- c("quarter","quarter", "quarter", "quarter", "half")


# Example #2: Transposed Recall
ex2_t <- ex1_t
ex2_r <- ex1_t + 1


# The Lick

the_lick <- c(67, 69, 70, 72, 69, 65, 67)
the_lick_rhythms <- c(0.25, 0.25, 0.25, 0.25, 0.5, 0.5, 1)
the_lick_rhythms_text <- c("eighth","eighth", "eighth", "eighth", "quarter", "quarter", "half")



# Example #3: The Lick plus its retrograde (rhythms the same

the_lick_retrograde <- the_lick[length(the_lick):1]


# Example #4: The Lick plus its inversion (rhythms the same)

the_lick_inverted <- invert_melody(the_lick)

# "Example #5: The Lick plus its retrograde inversion (rhythms the same)

the_lick_retrograde_inversion <- the_lick_inverted[length(the_lick_inverted):1]


# Example #6: The Lick plus its rhythm values changed.

the_lick_rhythms_diff <- c(0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.5)
the_lick_rhythms_text_diff <- c("quarter","quarter", "quarter", "quarter", "eighth", "eighth", "quarter")

s1 <- create_sheet(ex1_t, ex1_r, ex1_rhythms_text, ex1_rhythms_text,
                   target_title = "Ex.1 Target", recall_title = "Ex.1 Recall")

s2 <- create_sheet(ex2_t, ex2_r, ex1_rhythms_text, ex1_rhythms_text,
                   target_title = "Ex.2 Target", recall_title = "Ex.2 Recall")

s3 <- create_sheet(the_lick, the_lick_retrograde, the_lick_rhythms_text, the_lick_rhythms_text, key = -1, 
                   target_title = "Ex.3 Target", recall_title = "Ex.3 Recall")

s4 <- create_sheet(the_lick, the_lick_inverted, the_lick_rhythms_text, the_lick_rhythms_text, key = -1, 
                   target_title = "Ex.4 Target", recall_title = "Ex.4 Recall")

s5 <- create_sheet(the_lick, the_lick_retrograde_inversion, the_lick_rhythms_text, the_lick_rhythms_text, key = -1, 
                   target_title = "Ex.5 Target", recall_title = "Ex.5 Recall")

s6 <- create_sheet(the_lick, the_lick, the_lick_rhythms_text, the_lick_rhythms_text_diff, key = -1,
                   target_title = "Ex.6 Target", recall_title = "Ex.6 Recall")

main_tb <- rbind(
  sim_vs_acc(target = ex1_t, recall = ex1_r, example_no = 1, example_description = "Reversed Recall", 
             target_rhythms = ex1_rhythms, recall_rhythms = ex1_rhythms),
  sim_vs_acc(target = ex2_t, recall = ex2_r, example_no = 2, example_description = "Transposed Recall", 
             target_rhythms = ex1_rhythms, recall_rhythms = ex1_rhythms),
  sim_vs_acc(
  target =  the_lick,
  target_rhythms = the_lick_rhythms,
  recall = the_lick_retrograde,
  recall_rhythms = the_lick_rhythms,
  example_no = 3,
  example_description = "The Lick vs. its retrograde (rhythms the same)"),
  
  sim_vs_acc(
    target =  the_lick,
    target_rhythms = the_lick_rhythms,
    recall = the_lick_inverted,
    recall_rhythms = the_lick_rhythms,
    example_no = 4,
    example_description = "The Lick plus its inversion (rhythms the same)"),
  
  sim_vs_acc(
    target =  the_lick,
    target_rhythms = the_lick_rhythms,
    recall = the_lick_retrograde_inversion,
    recall_rhythms = the_lick_rhythms,
    example_no = 5,
    example_description = "The Lick plus its retrograde inversion (rhythms the same)"
  ),
  
  
  sim_vs_acc(
    target =  the_lick,
    target_rhythms = the_lick_rhythms,
    recall = the_lick,
    recall_rhythms = the_lick_rhythms_diff,
    example_no = 6,
    example_description = "The Lick plus its rhythm values changed."
  )


)
  
main_tb %>% 
  knitr::kable(caption = "Examples of two comparison melodies scored for their level of similarity.")