Digging deeper

The core functioning of Recombinase is based on a set of preimplemented analysis functions and on the OnlineStats package to compute summary statistics efficiently.

Analysis functions

Under the hood, the actual computations are implemented in some built-in Analysis objects. Each Analysis takes some vectors (a varying number, one for density and two for prediction for example), an optional axis argument to specify the x axis argument and returns an iterator of (x, y) values, representing the y value corresponding to each point of the x axis.

using Recombinase: density, hazard, cumulative, prediction
x = randn(100)
res = density(x)
collect(Iterators.take(res, 5)) # let's see the first few elements
5-element Array{Tuple{Float64,Float64},1}:
 (-3.2822076324657883, 0.011137115097713741)
 (-3.2223437577156213, 0.011724289863314653)
 (-3.162479882965454, 0.012353142916872527) 
 (-3.102616008215287, 0.01309271718531768)  
 (-3.0427521334651195, 0.014024742114368386)

We can collect the iterator in a table container to see that we recover the x and y axis, ready for plotting

using JuliaDB
s = table(res)
Table with 100 rows, 2 columns:
1         2
───────────────────
-3.28221  0.0111371
-3.22234  0.0117243
-3.16248  0.0123531
-3.10262  0.0130927
-3.04275  0.0140247
-2.98289  0.0152384
-2.92302  0.0168238
-2.86316  0.0188649
-2.8033   0.021432
⋮
2.22527   0.0639509
2.28513   0.0564181
2.345     0.0495919
2.40486   0.04346
2.46472   0.0379894
2.52459   0.0331317
2.58445   0.028829
2.64432   0.0250198

prediction is similar but requrires two arguments:

xs = 10 .* rand(100)
ys = sin.(xs) .+ 0.5 .* rand(100)

res = prediction(xs, ys)
table(res)
Table with 98 rows, 2 columns:
1         2
──────────────────
0.317205  0.758609
0.414354  0.8122
0.511502  0.860048
0.608651  0.902176
0.7058    0.938782
0.802948  0.969975
0.900097  0.995706
0.997246  1.01614
1.09439   1.03156
⋮
9.06058   0.611289
9.15773   0.556451
9.25488   0.496429
9.35203   0.431236
9.44918   0.36094
9.54633   0.285347
9.64348   0.204329
9.74062   0.117791

The function discrete can turn any analysis into its discrete counter-part (analyses are continuous for numerical axes and discrete otherwise by default). It also automatically comutes the error of the prediction (s.e.m).

using Recombinase: discrete

x = rand(1:3, 1000)
y = rand(1000) .+ 0.1 .* x

res = discrete(prediction)(x, y)
table(res)
Table with 3 rows, 2 columns:
first  second
────────────────────────────
1      (0.620643, 0.0154153)
2      (0.695753, 0.0157434)
3      (0.801415, 0.0152603)

OnlineStats integration

Summary statistics are computed using the excellent OnlineStats package. First the data is split by the splitting variable (in this case School), then the analysis is computed on each split chunk on the selected column(s). Finally the results from different schools are put together in summary statistics (default is mean and s.e.m):

using Recombinase: datafolder, compute_summary
data = loadtable(joinpath(datafolder, "school.csv"))
compute_summary(density, data, :School; select = :MAch)
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
 (-2.832, (0.007095095625459628, 0.0005825497483108032))             
 (-2.550939393939394, (0.007838324051760595, 0.0006260574836083454)) 
 (-2.269878787878788, (0.008619592372159876, 0.0006696627630044319)) 
 (-1.9888181818181818, (0.009438313664963307, 0.0007134192242440505))
 (-1.7077575757575758, (0.010293824814249444, 0.0007574849542421938))
 (-1.4266969696969698, (0.011185676423288606, 0.0008020675411028164))
 (-1.1456363636363636, (0.01211335769099234, 0.0008474013402249302)) 
 (-0.8645757575757576, (0.01307621450967119, 0.0008936976283633891)) 
 (-0.5835151515151515, (0.014073305125439826, 0.0009410934724699248))
 (-0.3024545454545455, (0.015103285160103232, 0.000989599947597033)) 
 ⋮                                                                   
 (22.744515151515152, (0.030139050666702846, 0.0014970627285730306)) 
 (23.02557575757576, (0.028632688481218565, 0.0014533685239022608))  
 (23.306636363636365, (0.02704926915097793, 0.0014012627830656734))  
 (23.58769696969697, (0.025399256774123875, 0.001340841300777614))   
 (23.868757575757577, (0.023696502599550594, 0.0012724899216037388)) 
 (24.149818181818183, (0.02195754966684903, 0.0011970079579388833))  
 (24.43087878787879, (0.02020131354820369, 0.0011156135636389236))   
 (24.711939393939392, (0.018448259380673813, 0.0010299231340946513)) 
 (24.993, (0.016719486604897808, 0.0009418573882592296))             

Any summary statistic can be used. If we only want the mean we can use

using OnlineStats
compute_summary(density, data, :School; select = :MAch, stats = Mean())
100-element StructArray(::Array{Float64,1}, ::Array{Float64,1}) with eltype Tuple{Float64,Float64}:
 (-2.832, 0.007095095625459628)             
 (-2.550939393939394, 0.007838324051760595) 
 (-2.269878787878788, 0.008619592372159876) 
 (-1.9888181818181818, 0.009438313664963307)
 (-1.7077575757575758, 0.010293824814249444)
 (-1.4266969696969698, 0.011185676423288606)
 (-1.1456363636363636, 0.01211335769099234) 
 (-0.8645757575757576, 0.01307621450967119) 
 (-0.5835151515151515, 0.014073305125439826)
 (-0.3024545454545455, 0.015103285160103232)
 ⋮                                          
 (22.744515151515152, 0.030139050666702846) 
 (23.02557575757576, 0.028632688481218565)  
 (23.306636363636365, 0.02704926915097793)  
 (23.58769696969697, 0.025399256774123875)  
 (23.868757575757577, 0.023696502599550594) 
 (24.149818181818183, 0.02195754966684903)  
 (24.43087878787879, 0.02020131354820369)   
 (24.711939393939392, 0.018448259380673813) 
 (24.993, 0.016719486604897808)             

We can also pass a Tuple of statistics to compute many at the same time:

using OnlineStats
compute_summary(density, data, :School; select = :MAch, stats = Series(Mean(), Variance()))
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
 (-2.832, (0.007095095625459628, 5.4298273481116836e-5))              
 (-2.550939393939394, (0.007838324051760595, 6.271167564512218e-5))   
 (-2.269878787878788, (0.008619592372159876, 7.175171458475678e-5))   
 (-1.9888181818181818, (0.009438313664963307, 8.143471832335724e-5))  
 (-1.7077575757575758, (0.010293824814249444, 9.180535294452776e-5))  
 (-1.4266969696969698, (0.011185676423288606, 0.00010292997447851489))
 (-1.1456363636363636, (0.01211335769099234, 0.00011489424502640124)) 
 (-0.8645757575757576, (0.01307621450967119, 0.00012779127215077542)) 
 (-0.5835151515151515, (0.014073305125439826, 0.00014170510782808015))
 (-0.3024545454545455, (0.015103285160103232, 0.0001566892890054481)) 
 ⋮                                                                    
 (22.744515151515152, (0.030139050666702846, 0.00035859149012520443)) 
 (23.02557575757576, (0.028632688481218565, 0.00033796481060317384))  
 (23.306636363636365, (0.02704926915097793, 0.0003141659819527931))   
 (23.58769696969697, (0.025399256774123875, 0.0002876568630193606))   
 (23.868757575757577, (0.023696502599550594, 0.0002590768960932943))  
 (24.149818181818183, (0.02195754966684903, 0.0002292524882190425))   
 (24.43087878787879, (0.02020131354820369, 0.00019913497974002222))   
 (24.711939393939392, (0.018448259380673813, 0.00016971866594293585)) 
 (24.993, (0.016719486604897808, 0.00014193525437095956))             

One can optionally pass pairs of OnlineStat and function to apply the function to the OnlineStat before plotting (default is just taking the value). To compute a different error bar (for example just the standard deviation) you can simply do:

stats = (Mean(), Variance() => t -> sqrt(value(t)))
compute_summary(density, data, :School; select = :MAch, stats = stats)
100-element StructArray(::Array{Float64,1}, StructArray(::Array{Float64,1}, ::Array{Float64,1})) with eltype Tuple{Float64,Tuple{Float64,Float64}}:
 (-2.832, (0.007095095625459628, 0.007368736220079861))             
 (-2.550939393939394, (0.007838324051760595, 0.007919070377583607)) 
 (-2.269878787878788, (0.008619592372159876, 0.008470638381182187)) 
 (-1.9888181818181818, (0.009438313664963307, 0.009024118700646464))
 (-1.7077575757575758, (0.010293824814249444, 0.009581510994855026))
 (-1.4266969696969698, (0.011185676423288606, 0.01014544106870248)) 
 (-1.1456363636363636, (0.01211335769099234, 0.010718873309560163)) 
 (-0.8645757575757576, (0.01307621450967119, 0.011304480180476032)) 
 (-0.5835151515151515, (0.014073305125439826, 0.011903995456487714))
 (-0.3024545454545455, (0.015103285160103232, 0.012517559227159587))
 ⋮                                                                  
 (22.744515151515152, (0.030139050666702846, 0.018936512089748853)) 
 (23.02557575757576, (0.028632688481218565, 0.018383819260512052))  
 (23.306636363636365, (0.02704926915097793, 0.017724727979655797))  
 (23.58769696969697, (0.025399256774123875, 0.016960449965120635))  
 (23.868757575757577, (0.023696502599550594, 0.016095865807507662)) 
 (24.149818181818183, (0.02195754966684903, 0.015141086097735608))  
 (24.43087878787879, (0.02020131354820369, 0.014111519398704812))   
 (24.711939393939392, (0.018448259380673813, 0.013027611674552471)) 
 (24.993, (0.016719486604897808, 0.01191365831182679))              

This page was generated using Literate.jl.