Difference between revisions of "FA125 firmware check"

From GlueXWiki
Jump to: navigation, search
Line 131: Line 131:
 
| 4710 || y || 71126614 || 1903 (+65) total 0.003%  || 154 || 0 || 9 || 1730 (+37)  || 24 (+28)  || 0 ||  CDC ||  
 
| 4710 || y || 71126614 || 1903 (+65) total 0.003%  || 154 || 0 || 9 || 1730 (+37)  || 24 (+28)  || 0 ||  CDC ||  
 
|-
 
|-
| 4711 || NO || 133839484 || 3891 (+76) total 0.003%  || 398 || 1 || 27 || 3461 (+38)  || 56 (+38)  || 1 ||  CDC || segfault  (+ 2 hits w lost samples found when crashed later on)
+
| 4711 || n || 133839484 || 3891 (+76) total 0.003%  || 398 || 1 || 27 || 3461 (+38)  || 56 (+38)  || 1 ||  CDC || 1 hit with lost samples, 1 nasty error
 
|-
 
|-
 
| 4715 || n || 13973245 || 415 (+10) total 0.003%  || 29 || 0 || 2 || 383 (+6)  || 5 (+4)  || 0 ||  CDC&FDC || 1 hit with lost samples
 
| 4715 || n || 13973245 || 415 (+10) total 0.003%  || 29 || 0 || 2 || 383 (+6)  || 5 (+4)  || 0 ||  CDC&FDC || 1 hit with lost samples
Line 172: Line 172:
  
 
<h3>Segfaults</h3>
 
<h3>Segfaults</h3>
Hd_root segfaults with runs 4039 and 4711
+
Hd_root segfaults with run 4039 which had strange config parameters
  
 
<pre>
 
<pre>
Line 202: Line 202:
  
 
<pre>
 
<pre>
004711 hd_root crash
+
004711  
 
JANA ERROR>>
 
JANA ERROR>>
 
JANA ERROR>>Stack trace:
 
JANA ERROR>>Stack trace:

Revision as of 15:18, 2 February 2016

Used the raw samples to emulated the fa125's calculated values and compared them with the fa125 output.

Runs 3293 and 4062 (first few files)

Number of discrepancies between firmware output and emulation output ('complete events' have Pulse and Raw data present)
Run total events complete events time q amplitude pedestal integral overflow count
3923 16582555 1160 1942 3547 15609 1
3923 minus 2 bad fadcs 15914850 378 1877 288 12798 1
4062 file 000 10249118 10247801 0 0 98 (70 early hits) 0 3 0
4062 file 001 10283167 10282756 3 0 68 (46 early hits) 0 2 1
4062 file 002 10262816 10259658 1 0 65 (41 early hits) 0 5 1
4062 file 003 10248768 10245940 2 0 58 (40 early hits) 1 4 3
4062 file 004 10251199 10249150 0 0 73 (53 early hits) 0 8 2


Many of the problems in 3923 were due to hardware faults in roc28 slots 5&6.

11770 of the 12798 difference in integral were due to faulty assignment of the overflow bit. This has been fixed.

There were a few problems in the firmware which Cody described & fixed before run 4062. The remaining issues are not critical.


Differences in data/emulation from run 4062

The first 4 are firmware logic, & not necessarily a mistake (could be a mistake in the emulator); the remaining 3 are more weird.

1. Integral differences - these are from very late hits where there is a small peak at the end of the hit search window that only just clears the threshold crossing, followed by a larger peak a few samples later. The timing algorithm returns the time for the larger peak and the emulated integral is 0 because it is out of the window. [Cody to fix]

2. Amplitude differences - both firmware and emulation are starting the peak search from the threshold crossing sample but it should really start from the sample containing the leading edge time, since very occasionally the search will pick up a different peak (usually a small one before a larger one). [Cody and Naomi to fix]

3. Amplitude differences - early hits - 70 of the 98 differences are where the samples at the start of the window are over threshold, decrease for one sample and then rise again, ie. hitsample==20 && (adc[20]>=adc[21]) && (adc[21]<adc[22]) where adc[20] is the first sample in the hit search window. Emulator returns maxamp = adc[20]; firmware returns amp of the following peak. [Cody to fix]

4. Amplitude differences - later hits - 28 of the 98 differences seem to have no apparent cause, no association with roc/slot/channel, in most cases the max amp reported is larger than all the sample values [???]

5. Overflow count - the firmware is counting overflows from the hit sample - PG on, but the emulator counts them for the entire data window. Naomi will change the emulator to match the firmware. [Naomi to fix]

6. Missing pulse or WRD - pulse & Window raw data are separate objects in the evio eventloop, not linked yet. One out of sync pair causes the rest of the data for that trigger to be out of step.
[from WRD without straws, all digihits have straws, David to fix w association]


fa125 object mismatches - consequence of missing pulse data

Every now and then a CDCPulseData is missing from the start of an event and then the following subsequent CDCPulseData and WindowRawData objects are out of step for a while (the mismatches continue into following events and eventually stop). The objects are kept together in the analysis code when the PulseData is associated with the WindowRawData.
In run 4731 Beni found 2 pulse data words missing from 57 files


Insufficient samples

eg eventnum 6520553 and 6520556 in Run003923. Naomi finds insufficient samples, Beni does not. and in run 4101, 3 instances of insufficient samples, in evio files 001 (86 samples), 012 (98 samples) and 020 (86 samples).


recent cosmics data

(4296 and 4593 and 4594 do not have CDC data)

4594 FDC params are 

FADC125_MODE         7
FADC125_W_OFFSET     430
FADC125_W_WIDTH      80
FADC125_IE           16
FADC125_NPEAK        1

FADC125_PG          4
FADC125_P1          4
FADC125_P2          4

FADC125_IBIT        4
FADC125_ABIT        0
FADC125_PBIT        3
004003 #files= 14 modes 6 (7)  Config file in RCDB is outdated FADC125_W_WIDTH=180 FADC125_IE=80 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=4 from Dec 2015  CDC readout
004039 #files= 14 modes 6 (7)  Config file in RCDB is outdated FADC125_W_WIDTH=180 FADC125_IE=80 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1                CDC readout 

004044 #files= 2  modes 6 (7)  DAQ params look ok.  FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 from 8th Dec 2015       CDC readout 
004062 #files= 21 modes 6  7   DAQ params look ok.  FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=115
004101 #files= 22 modes 6 (7)  DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=115               CDC readout

004593 #files= 2  modes 6  7   DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=115 7 Jan 2016    No CDC data
004594 #files= 2  modes 6  7   DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=115               No CDC data

004595 #files= 2  modes 3  4  **Short modes**      FA125 params as 4594 except mode #  TS_TRIG_HOLD=30 1 BLOCKLEVEL=20 BUFFERLEVEL=8 CDC_H=115               CDC+FDC readout
004597 #files= 57 modes 3  4  **Short modes**      FA125 params as 4594 except mode #  TS_TRIG_HOLD=30 1 BLOCKLEVEL=20 BUFFERLEVEL=8 CDC_H=115               CDC+FDC readout

004701 #files= 4  modes 6 (8)  DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120               CDC readout
004706 #files= 2  modes 6 (8)  DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1                         CDC readout
004710 #files= 5  modes 6 (8)  DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1                         CDC readout  
004711 #files= 8  modes 6 (8)  DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120               CDC readout  

004715 #files= 2  modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 FDC PBit changed to 2   CDC+FDC readout
004717 #files= 4  modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120 FDC PBit=2    CDC+FDC readout
004718 #files= 5  modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120 FDC PBit=2    CDC+FDC readout

004731 #files= 59 modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1   FDC PBit=2            CDC+FDC readout 
004745 #files= 2  modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120 FDC PBit=2    CDC+FDC readout 
004746 #files= 8  modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1 CDC_H=120 FDC PBit=2    CDC+FDC readout
004747 #files= 45 modes 6 8    DAQ params look ok   FADC125_W_WIDTH=200  FADC125_IE=200 TS_TRIG_HOLD=30 1 BLOCKLEVEL=1 BUFFERLEVEL=1   FDC PBit=2            CDC+FDC readout 


Discrepancies between CDC firmware output and emulation output, evio ok = y means no crashes or other errors, (+x) means x errors from a known cause, UMO=unidentified module
Run evio ok hits diffs time q pedestal amplitude integral overflow count readout
4044 y 12593944 24 (+67) 0.0007% 0 0 0 23 (+56) 1 (+11) 0 CDC
4062 y 207661715 503 (+1151) total 0.0008% 30 0 11 440 (+1070) 24 (+81) 0 CDC
4101 n 237994979 560 (+1077) total 0.0007% 61 0 13 473 (+989) 13 (+88) 1 CDC 3 hits with lost samples, 6 UMO
4595 y 0 no sample data 0 0 0 0 (+0) 0 (+0) 0 CDC&FDC? short mode
4701 y 53322126 1462 (+34) total 0.003% 154 2 17 1297 (+20) 24 (+14) 0 CDC
4706 n 21954660 560 (+15) total 0.003% 54 0 3 494 (+13) 15 (+2) 1 CDC 1 lost pulse, 1 hit with lost samples, 4 UMO
4710 y 71126614 1903 (+65) total 0.003% 154 0 9 1730 (+37) 24 (+28) 0 CDC
4711 n 133839484 3891 (+76) total 0.003% 398 1 27 3461 (+38) 56 (+38) 1 CDC 1 hit with lost samples, 1 nasty error
4715 n 13973245 415 (+10) total 0.003% 29 0 2 383 (+6) 5 (+4) 0 CDC&FDC 1 hit with lost samples
4717 y 40104725 1112 (+39) total 0.003% 91 0 8 1010 (+21) 16 (+18) 0 CDC&FDC
4718 n 45419513 1227 (+31) total 0.003% 106 0 6 1109 (+16) 20 (+15) 0 CDC&FDC 1 lost pulse
4745 y 16760886 401 (+9) total 0.0024% 29 0 3 366 (+4) 3 (+5) 0 CDC&FDC
4746 y 81035281 2149 (+53) total 0.003% 158 1 19 1964 (+30) 34 (+23) 0 CDC&FDC


4593 & 4594 showed 0 pulse data


EVIO problems

4101 6 instances of Unknown module type 15;  3 instances of insufficient samples, in evio files 001 (86 samples), 012 (98 samples) and 020 (86 samples).

4706 unknown module types 15, 9, 11, 11; 1 instance of insufficient samples, in evio file 000 (104 samples), 1 missing CDCPulse in file 000 eventnum 458209 roc 26 slot 13 chan 41 trig 54590945

4711 unknown module types 15, 15, 15, 9, 11 in file 000 ; 2 instances of insufficent samples, in evio file 001 (84 samples) and 002 (86 samples) segfaults file 000

4715 1 instance of unknown module type 15; 1 instance of insufficient samples, in evio file 000 (104 samples/100 samples) 

4718 1 missing CDCPulse in evio file 000, eventnum 683052 roc 26 slot 4 chan 17 trig 16804908 

4747 unknown module type 15; 1 instance of insufficient samples in evio file 009 (84/96 samples); missing CDC Pulses from file 021 event 18544541

Insufficient samples error: The number of samples passed into the fa125_algos routine (86) is less than the minimum required by the parameters in use (171). 
Parameter WE (150) should be decreased to 65 or less.


Segfaults

Hd_root segfaults with run 4039 which had strange config parameters

4039 hd_root segfault
#4  0x00007fde351937ef in TUnixSystem::DispatchSignals(ESignals) () from /home/gluex/root/v5-34-14_rhel6//lib/libCore.so
#5  <signal handler called>
#6  0x0000003c92b90048 in main_arena () from /lib64/libc.so.6
#7  0x000000000057d14b in MyProcessor::~MyProcessor (this=0x218e6d0, __in_chrg=<value optimized out>) at programs/Analysis/hd_root/MyProcessor.cc:51
#8  0x000000000057d419 in MyProcessor::~MyProcessor (this=0x218e6d0, __in_chrg=<value optimized out>) at programs/Analysis/hd_root/MyProcessor.cc:57
#9  0x0000000000581315 in main (narg=16, argv=0x7ffd8b606c18) at programs/Analysis/hd_root/hd_root.cc:47
===========================================================


The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x0000003c92b90048 in main_arena () from /lib64/libc.so.6
#7  0x000000000057d14b in MyProcessor::~MyProcessor (this=0x218e6d0, __in_chrg=<value optimized out>) at programs/Analysis/hd_root/MyProcessor.cc:51
#8  0x000000000057d419 in MyProcessor::~MyProcessor (this=0x218e6d0, __in_chrg=<value optimized out>) at programs/Analysis/hd_root/MyProcessor.cc:57
#9  0x0000000000581315 in main (narg=16, argv=0x7ffd8b606c18) at programs/Analysis/hd_root/hd_root.cc:47
===========================================================


Segmentation fault (core dumped)
004711 
JANA ERROR>>
JANA ERROR>>Stack trace:
JANA ERROR>>
JANA ERROR>>   jana::JException::getStackTrace(bool, unsigned long)
JANA ERROR>>   jana::JException::JException(std::string const&)
JANA ERROR>>   JEventSource_EVIO::ParseF1TDCBank(int, unsigned int const*&, unsigned int const*, std::list<JEventSource_EVIO::ObjList*, std::allocator<JEventSource_EVIO::ObjList*> >&)
JANA ERROR>>   JEventSource_EVIO::ParseJLabModuleData(int, unsigned int const*&, unsigned int const*, std::list<JEventSource_EVIO::ObjList*, std::allocator<JEventSource_EVIO::ObjList*> >&)
JANA ERROR>>   JEventSource_EVIO::ParseEVIOEvent(evio::evioDOMTree*, std::list<JEventSource_EVIO::ObjList*, std::allocator<JEventSource_EVIO::ObjList*> >&)
JANA ERROR>>   JEventSource_EVIO::ParseEvents(JEventSource_EVIO::ObjList*)
JANA ERROR>>   JEventSource_EVIO::GetObjects(jana::JEvent&, jana::JFactory_base*)
JANA ERROR>>   jerror_t jana::JEvent::GetObjects<DCDCDigiHit>(std::vector<DCDCDigiHit const*, std::allocator<DCDCDigiHit const*> >&, jana::JFactory_base*)
JANA ERROR>>   jana::JFactory<DCDCDigiHit>* jana::JEventLoop::GetFromFactory<DCDCDigiHit>(std::vector<DCDCDigiHit const*, std::allocator<DCDCDigiHit const*> >&, char const*, jana::JEventLoop::data_source_t&, bool)
JANA ERROR>>   jana::JFactory<DCDCDigiHit>* jana::JEventLoop::Get<DCDCDigiHit>(std::vector<DCDCDigiHit const*, std::allocator<DCDCDigiHit const*> >&, char const*, bool)
JANA ERROR>>   JEventProcessor_CDC_em::evnt(jana::JEventLoop*, unsigned long)
JANA ERROR>>   jana::JEventLoop::OneEvent()
JANA ERROR>>   jana::JEventLoop::Loop()
JANA ERROR>>   LaunchThread(void*)
JANA ERROR>>   LaunchThread(void*)
JANA ERROR>>   LaunchThread(void*)
JANA ERROR>>
JANA ERROR>>