По-перше, я можу помилитися, але я думаю, що ваше рішення не буде працювати в тому випадку, коли є кілька крапок, коли оцінки щільності ядра (KDE) перетинаються. По-друге, хоча overlap
пакет був створений для використання з даними часових позначок, ви все одно можете використовувати його для оцінки площі накладання будь-яких двох KDE. Вам просто потрібно змінити масштаб даних, щоб вони становили від 0 до 2π.
Для прикладу:
# simulate two sample
a <- rnorm(100)
b <- rnorm(100, 2)
# To use overplapTrue(){overlap} the scale must be in radian (i.e. 0 to 2pi)
# To keep the *relative* value of a and b the same, combine a and b in the
# same dataframe before rescaling. You'll need to load the ‘scales‘ library.
# But first add a "Source" column to be able to distinguish between a and b
# after they are combined.
a = data.frame( value = a, Source = "a" )
b = data.frame( value = b, Source = "b" )
d = rbind(a, b)
library(scales)
d$value <- rescale( d$value, to = c(0,2*pi) )
# Now you can created the rescaled a and b vectors
a <- d[d$Source == "a", 1]
b <- d[d$Source == "b", 1]
# You can then calculate the area of overlap as you did previously.
# It should give almost exactly the same answers.
# Or you can use either the overlapTrue() and overlapEst() function
# provided with the overlap packages.
# Note that with these function the KDE are fitted using von Mises kernel.
library(overlap)
# Using overlapTrue():
# define limits of a common grid, adding a buffer so that tails aren't cut off
lower <- min(d$value)-1
upper <- max(d$value)+1
# generate kernel densities
da <- density(a, from=lower, to=upper, adjust = 1)
db <- density(b, from=lower, to=upper, adjust = 1)
# Compute overlap coefficient
overlapTrue(da$y,db$y)
# Using overlapEst():
overlapEst(a, b, kmax = 3, adjust=c(0.8, 1, 4), n.grid = 500)
# You can also plot the two KDEs and the region of overlap using overlapPlot()
# but sadly I haven't found a way of changing the x scale so that the scale
# range correspond to the initial x value and not the rescaled value.
# You can only change the maximum value of the scale using the xscale argument
# (i.e. it always range from 0 to n, where n is set with xscale = n).
# So if some of your data take negative value, you're probably better off with
# a different plotting method. You can change the x label with the xlab
# argument.
overlapPlot(a, b, xscale = 10, xlab= "x metrics", rug=T)