Digging deeper
The core functioning of Recombinase is based on a set of preimplemented analysis functions and on the OnlineStats package to compute summary statistics efficiently.
Analysis functions
Under the hood, the actual computations are implemented in some built-in Analysis
objects. Each Analysis
takes some vectors (a varying number, one for density
and two for prediction
for example), an optional axis
argument to specify the x
axis argument and returns an iterator of (x, y)
values, representing the y
value corresponding to each point of the x
axis.
using Recombinase: density, hazard, cumulative, prediction
x = randn(100)
res = density(x)
collect(Iterators.take(res, 5)) # let's see the first few elements
5-element Array{Tuple{Float64,Float64},1}:
(-3.2822076324657883, 0.011137115097713741)
(-3.2223437577156213, 0.011724289863314653)
(-3.162479882965454, 0.012353142916872527)
(-3.102616008215287, 0.01309271718531768)
(-3.0427521334651195, 0.014024742114368386)
We can collect the iterator in a table container to see that we recover the x
and y
axis, ready for plotting
using JuliaDB
s = table(res)
Table with 100 rows, 2 columns:
1 2
───────────────────
-3.28221 0.0111371
-3.22234 0.0117243
-3.16248 0.0123531
-3.10262 0.0130927
-3.04275 0.0140247
-2.98289 0.0152384
-2.92302 0.0168238
-2.86316 0.0188649
-2.8033 0.021432
⋮
2.22527 0.0639509
2.28513 0.0564181
2.345 0.0495919
2.40486 0.04346
2.46472 0.0379894
2.52459 0.0331317
2.58445 0.028829
2.64432 0.0250198
prediction
is similar but requrires two arguments:
xs = 10 .* rand(100)
ys = sin.(xs) .+ 0.5 .* rand(100)
res = prediction(xs, ys)
table(res)
Table with 98 rows, 2 columns:
1 2
──────────────────
0.317205 0.758609
0.414354 0.8122
0.511502 0.860048
0.608651 0.902176
0.7058 0.938782
0.802948 0.969975
0.900097 0.995706
0.997246 1.01614
1.09439 1.03156
⋮
9.06058 0.611289
9.15773 0.556451
9.25488 0.496429
9.35203 0.431236
9.44918 0.36094
9.54633 0.285347
9.64348 0.204329
9.74062 0.117791
The function discrete
can turn any analysis into its discrete counter-part (analyses are continuous for numerical axes and discrete otherwise by default). It also automatically comutes the error of the prediction (s.e.m).
using Recombinase: discrete
x = rand(1:3, 1000)
y = rand(1000) .+ 0.1 .* x
res = discrete(prediction)(x, y)
table(res)
Table with 3 rows, 2 columns:
first second
────────────────────────────
1 (0.620643, 0.0154153)
2 (0.695753, 0.0157434)
3 (0.801415, 0.0152603)
OnlineStats integration
Summary statistics are computed using the excellent OnlineStats package. First the data is split by the splitting variable (in this case School
), then the analysis is computed on each split chunk on the selected column(s). Finally the results from different schools are put together in summary statistics (default is mean
and s.e.m
):
using Recombinase: datafolder, compute_summary
data = loadtable(joinpath(datafolder, "school.csv"))
compute_summary(density, data, :School; select = :MAch)
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
(-2.832, (0.007095095625459628, 0.0005825497483108032))
(-2.550939393939394, (0.007838324051760595, 0.0006260574836083454))
(-2.269878787878788, (0.008619592372159876, 0.0006696627630044319))
(-1.9888181818181818, (0.009438313664963307, 0.0007134192242440505))
(-1.7077575757575758, (0.010293824814249444, 0.0007574849542421938))
(-1.4266969696969698, (0.011185676423288606, 0.0008020675411028164))
(-1.1456363636363636, (0.01211335769099234, 0.0008474013402249302))
(-0.8645757575757576, (0.01307621450967119, 0.0008936976283633891))
(-0.5835151515151515, (0.014073305125439826, 0.0009410934724699248))
(-0.3024545454545455, (0.015103285160103232, 0.000989599947597033))
⋮
(22.744515151515152, (0.030139050666702846, 0.0014970627285730306))
(23.02557575757576, (0.028632688481218565, 0.0014533685239022608))
(23.306636363636365, (0.02704926915097793, 0.0014012627830656734))
(23.58769696969697, (0.025399256774123875, 0.001340841300777614))
(23.868757575757577, (0.023696502599550594, 0.0012724899216037388))
(24.149818181818183, (0.02195754966684903, 0.0011970079579388833))
(24.43087878787879, (0.02020131354820369, 0.0011156135636389236))
(24.711939393939392, (0.018448259380673813, 0.0010299231340946513))
(24.993, (0.016719486604897808, 0.0009418573882592296))
Any summary statistic can be used. If we only want the mean we can use
using OnlineStats
compute_summary(density, data, :School; select = :MAch, stats = Mean())
100-element StructArray(::Array{Float64,1}, ::Array{Float64,1}) with eltype Tuple{Float64,Float64}:
(-2.832, 0.007095095625459628)
(-2.550939393939394, 0.007838324051760595)
(-2.269878787878788, 0.008619592372159876)
(-1.9888181818181818, 0.009438313664963307)
(-1.7077575757575758, 0.010293824814249444)
(-1.4266969696969698, 0.011185676423288606)
(-1.1456363636363636, 0.01211335769099234)
(-0.8645757575757576, 0.01307621450967119)
(-0.5835151515151515, 0.014073305125439826)
(-0.3024545454545455, 0.015103285160103232)
⋮
(22.744515151515152, 0.030139050666702846)
(23.02557575757576, 0.028632688481218565)
(23.306636363636365, 0.02704926915097793)
(23.58769696969697, 0.025399256774123875)
(23.868757575757577, 0.023696502599550594)
(24.149818181818183, 0.02195754966684903)
(24.43087878787879, 0.02020131354820369)
(24.711939393939392, 0.018448259380673813)
(24.993, 0.016719486604897808)
We can also pass a Tuple
of statistics to compute many at the same time:
using OnlineStats
compute_summary(density, data, :School; select = :MAch, stats = Series(Mean(), Variance()))
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
(-2.832, (0.007095095625459628, 5.4298273481116836e-5))
(-2.550939393939394, (0.007838324051760595, 6.271167564512218e-5))
(-2.269878787878788, (0.008619592372159876, 7.175171458475678e-5))
(-1.9888181818181818, (0.009438313664963307, 8.143471832335724e-5))
(-1.7077575757575758, (0.010293824814249444, 9.180535294452776e-5))
(-1.4266969696969698, (0.011185676423288606, 0.00010292997447851489))
(-1.1456363636363636, (0.01211335769099234, 0.00011489424502640124))
(-0.8645757575757576, (0.01307621450967119, 0.00012779127215077542))
(-0.5835151515151515, (0.014073305125439826, 0.00014170510782808015))
(-0.3024545454545455, (0.015103285160103232, 0.0001566892890054481))
⋮
(22.744515151515152, (0.030139050666702846, 0.00035859149012520443))
(23.02557575757576, (0.028632688481218565, 0.00033796481060317384))
(23.306636363636365, (0.02704926915097793, 0.0003141659819527931))
(23.58769696969697, (0.025399256774123875, 0.0002876568630193606))
(23.868757575757577, (0.023696502599550594, 0.0002590768960932943))
(24.149818181818183, (0.02195754966684903, 0.0002292524882190425))
(24.43087878787879, (0.02020131354820369, 0.00019913497974002222))
(24.711939393939392, (0.018448259380673813, 0.00016971866594293585))
(24.993, (0.016719486604897808, 0.00014193525437095956))
One can optionally pass pairs of OnlineStat
and function to apply the function to the OnlineStat
before plotting (default is just taking the value
). To compute a different error bar (for example just the standard deviation) you can simply do:
stats = (Mean(), Variance() => t -> sqrt(value(t)))
compute_summary(density, data, :School; select = :MAch, stats = stats)
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
(-2.832, (0.007095095625459628, 0.007368736220079861))
(-2.550939393939394, (0.007838324051760595, 0.007919070377583607))
(-2.269878787878788, (0.008619592372159876, 0.008470638381182187))
(-1.9888181818181818, (0.009438313664963307, 0.009024118700646464))
(-1.7077575757575758, (0.010293824814249444, 0.009581510994855026))
(-1.4266969696969698, (0.011185676423288606, 0.01014544106870248))
(-1.1456363636363636, (0.01211335769099234, 0.010718873309560163))
(-0.8645757575757576, (0.01307621450967119, 0.011304480180476032))
(-0.5835151515151515, (0.014073305125439826, 0.011903995456487714))
(-0.3024545454545455, (0.015103285160103232, 0.012517559227159587))
⋮
(22.744515151515152, (0.030139050666702846, 0.018936512089748853))
(23.02557575757576, (0.028632688481218565, 0.018383819260512052))
(23.306636363636365, (0.02704926915097793, 0.017724727979655797))
(23.58769696969697, (0.025399256774123875, 0.016960449965120635))
(23.868757575757577, (0.023696502599550594, 0.016095865807507662))
(24.149818181818183, (0.02195754966684903, 0.015141086097735608))
(24.43087878787879, (0.02020131354820369, 0.014111519398704812))
(24.711939393939392, (0.018448259380673813, 0.013027611674552471))
(24.993, (0.016719486604897808, 0.01191365831182679))
This page was generated using Literate.jl.