Transition to JANA2
Contents
Transition to JANA2
This page will document the transition of the GlueX software stack from JANA1 to JANA2.
Useful Links
- Talk at Fall 2024 Collaboration Meeting (Raiqa): Slides
- Transition guide for developers
- Tools for comparison:
- rootdiff.py compares root files. It is part of this repo: [1]
- compare_hists_dirs.py creates a pdf file with all plots in 2 different files
Discussion Points
- How do we treat halld_sim and hdgeant4 developments for previous run periods? Can we patch old halld_recon versions?
A: Need fine-grained version set selection in MCWrapper
halld_recon
Prerequisite Tests
- Objects: hd_dump
- DTrackTimeBased
- DTrackWireBased
- DBCALShower
- DVertex
- DChargedTrack
- Plugins: hd_root
- monitoring_hists
hd_root -PPLUGINS=monitoring_hists /cache/halld/RunPeriod-2017-01/rawdata/Run030300/hd_rawdata_030300_000.evio
- occupancy_online
hd_root -PPLUGINS=occupancy_online /cache/halld/RunPeriod-2017-01/rawdata/Run030300/hd_rawdata_030300_000.evio
- danarest
hd_root -PPLUGINS=danarest /cache/halld/RunPeriod-2017-01/rawdata/Run030300/hd_rawdata_030300_000.evio
- p2pi_hists, p3pi_hists: Output histograms can be checked with macros HistMacro_p2pi.C, HistMacro_p3pi.C
hd_root -PPLUGINS=p2pi_hists,p3pi_hists /cache/halld/RunPeriod-2017-01/rawdata/Run030300/hd_rawdata_030300_000.evio
- ReactionFilter with at least 2 reactions: jana_test.config, produces two trees
hd_root --config=/group/halld/www/halldweb/html/talks/2024/jana2/jana_test.config /cache/halld/RunPeriod-2017-01/rawdata/Run030300/hd_rawdata_030300_000.evio
- mcthrown_tree (for MC)
Additional Tests
Benchmarking Results
hdgeant4
hdgeant4 models the measurements (detector hits) produced by a generator. It is steered by a control.in file in the local directory and converts the generated hddm file. When control.in and input file are present in the local directory, simply execute
hdgeant4
halld_sim
hdgeant
The same input and control.in that was used for hdgeant4 can also be used for hdgeant(3):
hdgeant
mcsmear
mcsmear models the detector resolution to match the MC simulation results with actual measurements. It can use this input file and can be executed by
mcsmear gen_amp_030730_000_geant4.hddm
b1pi test
The b1pi test runs the full simulation and reconstruction chain for
gamma p -> p X X -> b1 pi- b1 -> omega pi+ omega -> pi+ pi- pi0
Usage:
b1pi_test.sh [-n <number of events>] [-t <number of threads>] [-r <run number>]\ [-v <vertex string>] [-d <b1pi_test script directory>]
The script performs these steps:
- genr8: Event generator, part of the halld_sim repository, output: b1_pi.ascii
- genr8_2_hddm: Convert output of genr8 to hddm file, in halld_sim repository, output: b1_pi.hddm
- hdgeant4: Running simulation with geant4, output: hdgeant4.hddm
- mcsmear: Adding detector effects to simulation, part of halld_sim repo, output: hdgeant_smeared.hddm
- hd_root with danarest plugin: Runnning reconstruction, part of halld_recon, output: dana_rest.hddm
- hd_root with b1pi_hists, monitoring_hists plugins: Analysis, part of halld_recon, output: hd_root.root
- root mk_pics.C: create plots and save them as pdf and gif
Example:
source /group/halld/Software/build_scripts/gluex_env_boot_jlab.sh gxenv /group/halld/www/halldweb/html/halld_versions/version_5.21.1_jana2.xml export B1PI_TEST_DIR=/group/halld/Software/hd_utilities/b1pi_test/ export SEED=123 export JANA_CALIB_CONTEXT="variation=mc" $B1PI_TEST_DIR/b1pi_test.sh -n 10000 -r 30480 -4
In the following plots, we compare few results from JANA1 (left) and JANA2 (right):
The remaining differences were caused by not using exactly the same halld_sim version and the change in the Get() function behavior in JANA2.
We repeated the b1pi test with version_5.22.1.xml (Jana1), version_5.22.2_jana2.xml (Jana2) and version_5.22.2_j2mj1.xml (Jana2 mimicking Jana1). The number of reconstructed X(2000) events was reduced by 4 out of 3705 with jana2, but fully recovered with "j2mj1". The number of reconstructed photons was reduced by 3 out of 31480 for both jana2 and "j2mj1", but without further consequence.
Monitoring launch
As a next step, we would like to run a full monitoring launch with JANA2. We prepared a configuration file with the necessary changes to the parameters. It can be processed with data from the GlueX-II 2023-01 run period, e.g.
hd_root --loadconfigs jana_offmon.config /cache/halld/RunPeriod-2023-01/rawdata/Run121102/hd_rawdata_121102_018.evio
We observe a several different failure modes:
- Plugin not compatible with jana2, failure already during loading: BCAL_TDC_Timing, pi0fcaltofskim
- Crash while running: TOF_online,TOF_TDC_shift,BCAL_inv_mass,HLDetectorTiming,FCAL_invmass,trackeff_missing,CDC_dedx
- Segfault at the end: BCAL_online,PS_flux,BCAL_Eff,BCAL_attenlength_gainratio
- Infinite loop at event 0 in combination with monitoring_hists: fa125_itrig
- Infinite loop or crash: TAGM_TW
All other plugins appear to run, tested with 4 threads and 10k events:
PLUGINS occupancy_online,highlevel_online,danarest,monitoring_hists,TAGH_online,TAGM_online,TAGM_clusters,BEAM_online,CDC_online,CDC_Efficiency,FCAL_online,FDC_online,FDC_Efficiency,ST_online_lowlevel,lowlevel_online,PS_online,PSC_online,PSPair_online,TPOL_online,BCAL_Hadronic_Eff,FCAL_Hadronic_Eff,p2pi_hists,p3pi_hists,ppi0gamma_hists,TRIG_online,CDC_drift,RF_online,CDC_expert_2,L1_online,FCAL_TimingOffsets_Primex,p4pi_hists,p2k_hists,CDC_TimeToDistance,TOF_calib,CDC_amp,TPOL_tree,evio_writer,randomtrigger_skim,syncskim,imaging,TAGH_timewalk,TrackingPulls,lumi_mon,PS_timing,ST_Tresolution,dirc_hists,dirc_reactions,DIRC_online
For this jobs, we compare the resident memory footprint between version_5.21.0.xml (jana1) and version_5.21.1_jana2.xml :
A similar picture appears with only monitoring_hists:
hd_root -PPLUGINS=monitoring_hists -PMONITOR:MEMORY_EVENTS=10000 -Pjana:nevents=10000 -PNTHREADS=4 /cache/halld/RunPeriod-2023-01/rawdata/Run121102/hd_rawdata_121102_018.evio
The memory leak and all plugins in the list are fixed in version_5.22.1_jana2.xml and version_5.22.1_j2mj1.xml. We compare the performance with version_5.22.0.xml for 100k (left) and 1M events (right):
HOW-TO use gdb
gdb --args hd_root -PPLUGINS=HLDetectorTiming /cache/halld/RunPeriod-2023-01/rawdata/Run121102/hd_rawdata_121102_018.evio (gdb) catch throw (optional) (gdb) run (gdb) continue (many times) (gdb) bt #0 0x00007ffff48ad7c2 in __cxa_throw () from /lib64/libstdc++.so.6 #1 0x000000000087a7e7 in JFactoryT<DEventRFBunch>* JEvent::GetSingle<DEventRFBunch>(DEventRFBunch const*&, char const*, bool) const () #2 0x00007fffe3e6d776 in JEventProcessor_HLDetectorTiming::Process(std::shared_ptr<JEvent const> const&) ()