https://halldweb1.jlab.org/wiki/api.php?action=feedcontributions&user=Romanov&feedformat=atomGlueXWiki - User contributions [en]2024-03-28T08:29:40ZUser contributionsMediaWiki 1.24.1https://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_July_26,_2017&diff=83128GlueX Offline Meeting, July 26, 20172017-07-26T15:02:29Z<p>Romanov: /* Agenda */</p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, July 26, 2017<br><br />
11:00 am EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## Major IT Outage Set for Saturday, July 29<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-July/002869.html Added capabilities in osg run scripts] (Richard, Sean)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-July/002886.html Reconstruction Launch: 2017-01 ver01] (Alex A.)<br />
## [https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2017q3/000155.html Singularity 2.3.1 install for the farm] (Mark)<br />
# Review of [[GlueX Offline Meeting, July 12, 2017#Minutes|minutes from the last meeting]] (all)<br />
# HDvis update (Dmitry, Thomas)<br />
#* [https://halldweb.jlab.org/talks/2017/HDvis2/js/event.html Sample Event]<br />
#* [https://docs.google.com/presentation/d/1JDnQ04ZeXHP8N3hPsoF-H9sZbhZiSWZdd71UPFOcIdo/edit?usp=sharing HDvis architecture]<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
# Review of [https://groups.google.com/forum/#!forum/gluex-software recent discussion on the GlueX Software Help List] (all)<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_June_28,_2017&diff=82849GlueX Offline Meeting, June 28, 20172017-06-28T12:45:51Z<p>Romanov: Add ccdb updates</p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, June 28, 2017<br><br />
11:00 am EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-June/002826.html Nightly builds include hdgeant4 and gluex_root_analysis] (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-June/002828.html HDPM 0.7.1] (Nathan)<br />
## Status of HDvis, the GlueX 3-D Event Display (Thomas, Dmitry)<br />
## Status of REST production [https://halldweb.jlab.org/data_monitoring/recon/summary_swif_output_recon_2017-01_ver01_batch01.html 2017-01 ver01 batch01] (Alex)<br />
# [https://mailman.jlab.org/pipermail/halld-offline/2017-June/002827.html Progress on using the OSG] (Richard)<br />
# HDPM support (Nathan)<br />
# [https://docs.google.com/presentation/d/1I4HwOJvdjp5HgRyDrGdSxHKOUsw8QfE-mGETosQI16Q/edit?usp=sharing CCDB update] (Dmitry)<br />
# Review of [[GlueX Offline Meeting, May 31, 2017#Minutes|minutes from the last meeting]] (all)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
# Review of [https://groups.google.com/forum/#!forum/gluex-software recent discussion on the GlueX Software Help List].<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_May_31,_2017&diff=82532GlueX Offline Meeting, May 31, 20172017-05-31T15:07:22Z<p>Romanov: /* Agenda */</p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, May 31, 2017<br><br />
11:00 am EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002753.html AmpTools moved to GitHub] (Matt)<br />
## MCwrapper [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002766.html 1.6] and [https://groups.google.com/forum/#!topic/gluex-software/3Mmm_vI0VT8 1.7] (Thomas)<br />
## "New" packages in build_scripts: AmpTools, hdgeant4, gluex_root_analysis, hd_utilities (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002768.html Certificates and GitHub] (Mark)<br />
## [https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2017q2/000147.html SciComp outage, June 1] (Mark)<br />
## [https://halldweb.jlab.org/data_monitoring/recon/summary_swif_output_recon_2016-02_ver04_batch01.html REST production 2016-02 ver04] (Alex)<br />
# Review of [[GlueX Offline Meeting, April 19, 2017#Minutes|minutes from the last meeting]] (all)<br />
# [https://github.com/JeffersonLab/sim-recon/branches Stale Branch Policy] (Mark)<br />
# 3D Event Display [https://docs.google.com/presentation/d/1LsaelWTnzkBUzuEs7yHPGq7nzwQq0aVH9VlemOZqQZ8/edit?usp=sharing|more slides] (Dmitry, Thomas)<br />
# [[Media:20170531_KNL_benchmark.pdf|Benchmarking on KNL]] (David)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
# Review of [https://groups.google.com/forum/#!forum/gluex-software recent discussion on the Gluex Software Help List].<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_May_31,_2017&diff=82531GlueX Offline Meeting, May 31, 20172017-05-31T15:06:51Z<p>Romanov: /* Agenda */</p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, May 31, 2017<br><br />
11:00 am EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002753.html AmpTools moved to GitHub] (Matt)<br />
## MCwrapper [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002766.html 1.6] and [https://groups.google.com/forum/#!topic/gluex-software/3Mmm_vI0VT8 1.7] (Thomas)<br />
## "New" packages in build_scripts: AmpTools, hdgeant4, gluex_root_analysis, hd_utilities (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2017-May/002768.html Certificates and GitHub] (Mark)<br />
## [https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2017q2/000147.html SciComp outage, June 1] (Mark)<br />
## [https://halldweb.jlab.org/data_monitoring/recon/summary_swif_output_recon_2016-02_ver04_batch01.html REST production 2016-02 ver04] (Alex)<br />
# Review of [[GlueX Offline Meeting, April 19, 2017#Minutes|minutes from the last meeting]] (all)<br />
# [https://github.com/JeffersonLab/sim-recon/branches Stale Branch Policy] (Mark)<br />
# 3D Event Display [https://docs.google.com/presentation/d/1LsaelWTnzkBUzuEs7yHPGq7nzwQq0aVH9VlemOZqQZ8/edit?usp=sharing|(more slides)] (Dmitry, Thomas)<br />
# [[Media:20170531_KNL_benchmark.pdf|Benchmarking on KNL]] (David)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
# Review of [https://groups.google.com/forum/#!forum/gluex-software recent discussion on the Gluex Software Help List].<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=82097HOWTO Run Gluex Software on Windows2017-04-24T17:47:57Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
<nowiki />* - '''Windows 15063.11''' - known as "Creators update". Official release was '''April 11, 2017'''. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch), <br />
<br />
* "Turn windows features on or off" and enable "Windows subsystem for linux" there<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, there are several options:<br />
<br />
* MobaXTerm (which has a free version) is highly customasible multi tab terminal with support of X11 outof the box.<br />
<br />
* [https://github.com/goreliu/wsl-terminal WSL-terminal] - includes mintty, wslbridge, cbwin, and some other useful tools<br />
<br />
* Finally you can install xfce4-terminal in WSL(''sudo apt-get install xfce4-terminal'') [http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
X11:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. Don't use VcXsrv, it has problems with OpenGL. Skip this step for MobaXTerm it ships X11 support. <br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and run fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=82096HOWTO Run Gluex Software on Windows2017-04-24T17:17:16Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
<nowiki />* - '''Windows 15063.11''' - known as "Creators update". Official release was '''April 11, 2017'''. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
X11:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and run fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=Run_Periods&diff=81946Run Periods2017-04-10T20:24:58Z<p>Romanov: </p>
<hr />
<div>'''Summary information about current Run Periods'''<br />
<br />
{|class="wikitable" style="text-align:center;"<br />
! Name !! Run Range !! Date Range !! beam !! Data on Tape !! Total Events !! Notes<br />
|-<br />
| RunPeriod-2017-01 || style="text-align:left;" | 30000 &ndash; || style="text-align:left;" | 23 Jan. 2017 &ndash; 13 Mar. 2017 || &#10003; || &ndash; || &ndash; || style="text-align:left;" | 12 GeV e<sup>-</sup><br />
|-<br />
| RunPeriod-2016-10 || style="text-align:left;" | 20000 &ndash; || style="text-align:left;" | 15 Sept. 2016 &ndash; 21 Dec. 2016|| &#10003; || &ndash; || &ndash; || style="text-align:left;" | 12 GeV e<sup>-</sup><br />
|-<br />
| RunPeriod-2016-02 || 10000 &ndash; 12109 || style="text-align:left;" | 28 Jan. 2016 &ndash; 8 Sept. 2016|| &#10003; || 570 TB || 39.8B || style="text-align:left;" | Commissioning, 12 GeV e<sup>-</sup><br />
|-<br />
| RunPeriod-2015-12 || 3939 &ndash; 4807 || style="text-align:left;" | 1 Dec. 2015 &ndash; 28 Jan. 2016 || &#10003; || 9TB || &ndash; || style="text-align:left;" | Commissioning, 12 GeV e<sup>-</sup>, Cosmics<br />
|-<br />
| RunPeriod-2015-06 || 3386 &ndash; 3938 || style="text-align:left;" | 29 May 2015 &ndash; 1 Dec 2015 || || 53TB || &ndash; || style="text-align:left;" | Cosmics<br />
|-<br />
| RunPeriod-2015-03 || 2607 &ndash; 3385 || style="text-align:left;" | 11 Mar. 2015 &ndash; 29 May 2015|| &#10003; || 74TB || 1285M || style="text-align:left;" | Commissioning, 5.5 GeV e<sup>-</sup><br />
|-<br />
| RunPeriod-2015-01 || 2440 &ndash; 2606 || 6 Feb. 2015 &ndash; 11 Mar. 2015 || || 11TB || 225M || style="text-align:left;" | Cosmics<br />
|-<br />
| RunPeriod-2014-10 || 630 &ndash; 2439 || 28 Oct. 2014 &ndash; 21 Dec. 2014 || &#10003; || 120TB || 932M* || style="text-align:left;" | Commissioning, 10 GeV e<sup>-</sup><br />
|}<br />
<nowiki>*</nowiki>The total size of the data from tape does not match that from the hd_runlog.sqlite DB (120TB vs. 50TB). The validity of the 932M events is therefore unclear.<br />
<br />
<hr><br />
Notes:<br />
* Data on tape is obtained from the [https://scicomp.jlab.org/scicomp/#/jasmine/usage Scicomp Tape Library Usage Web page]. One needs to select the start and end dates, "Data Written" for the data type, and then "halld". Hover the mouse over the far right side of the resulting plot to obtain total tape usage for run period.<br />
* Number of events was obtained from the SQLite file: /gluonraid1/monitoring/hd_runlog.sqlite on the gluon cluster. A query such as the following is done:<br />
*:''SELECT SUM(Nevents) FROM runs WHERE run>=2607 AND run<=3385 AND Nevents<1000000000;''<br />
*: ''n.b. the Nevents<1000000000 clause is because there was one early run with Nevents=2^32, an obviously incorrect value.''</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81850HOWTO Run Gluex Software on Windows2017-04-02T16:55:54Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
X11:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and run fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81849HOWTO Run Gluex Software on Windows2017-04-02T16:55:16Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
X11<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and run fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81848HOWTO Run Gluex Software on Windows2017-04-02T15:16:44Z<p>Romanov: /* Run GlueX software */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and run fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81847HOWTO Run Gluex Software on Windows2017-04-02T15:15:52Z<p>Romanov: /* Run GlueX software */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL and runs fast (was it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81846HOWTO Run Gluex Software on Windows2017-04-02T15:12:13Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
* '''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81845HOWTO Run Gluex Software on Windows2017-04-02T15:09:26Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. In "Creators update" it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on Windows 15063.11 "Creators update". It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81844HOWTO Run Gluex Software on Windows2017-04-02T15:06:39Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort, see [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81843HOWTO Run Gluex Software on Windows2017-04-02T15:00:11Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequisites'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequisites printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81842HOWTO Run Gluex Software on Windows2017-04-02T14:59:42Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including Fedora and CentOS 7. But it requires more effort [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81841HOWTO Run Gluex Software on Windows2017-04-02T10:53:50Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin using cygwin environment. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs out of the box on WSL but it is often a pain to build a big codebase on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including centos 7. But it requires more effort [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81840HOWTO Run Gluex Software on Windows2017-04-02T10:48:58Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software must be built/rebuilt for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box and it is a usual pain for example to build ROOT on cygwin. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including centos 7. But it requires more effort [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81839HOWTO Run Gluex Software on Windows2017-04-02T10:43:36Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
<br />
'''Other distributions'''<br />
<br />
It is possible to install other linux distributions, including centos 7. But it requires more effort [https://github.com/RoliSoft/WSL-Distribution-Switcher WSL Distribution Switcher]<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81838HOWTO Run Gluex Software on Windows2017-04-02T10:42:25Z<p>Romanov: /* Run GlueX software */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry converted to ROOT was even rendered with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81837HOWTO Run Gluex Software on Windows2017-04-02T10:35:48Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because the native Windows terminal (cmd.exe) is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81836HOWTO Run Gluex Software on Windows2017-04-02T10:34:11Z<p>Romanov: /* Installation */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/reference MSDN manual]<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm.<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81835HOWTO Run Gluex Software on Windows2017-04-02T10:28:32Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Official release is planned on '''April 11, 2017''' but RC is available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is beta, many things where significantly improved and work smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but is not recommended because of the above reasons. <br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81834HOWTO Run Gluex Software on Windows2017-04-02T10:26:27Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Officially release is planned on '''April 11 of 2017''' but RC available now. Historically, when WSL appeared in 2015 as an alfa feature, it had a number of bugs. Now it is a beta, many things where significantly improved and works smoothly (like running X11 apps). Starting with "Creators update" WSL comes with Ubuntu 16.04 (was 14.04 before).<br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too.<br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81833HOWTO Run Gluex Software on Windows2017-04-02T10:20:19Z<p>Romanov: </p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it. <br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Officially release is planned on '''April 11 of 2017''' but RC available now. Starting with this version WSL comes with Ubuntu 16.04 (was 14.04 before). Historically, when WSL appeared in 2016 as an alfa feature, it had a number of bugs like no stubs for DBUS, etc. Now even if it is still a beta, many things where significantly improved and works smoothly (like running X11 apps). <br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but it is recomended to use Windows version 15063+.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81832HOWTO Run Gluex Software on Windows2017-04-02T10:19:59Z<p>Romanov: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. Roughly saying ''it is Ubuntu running on Windows kernel inside Windows.'' <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it. <br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Officially release is planned on '''April 11 of 2017''' but RC available now. Starting with this version WSL comes with Ubuntu 16.04 (was 14.04 before). Historically, when WSL appeared in 2016 as an alfa feature, it had a number of bugs like no stubs for DBUS, etc. Now even if it is still a beta, many things where significantly improved and works smoothly (like running X11 apps). <br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but it is recomended to use Windows version 15063+.<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81831HOWTO Run Gluex Software on Windows2017-04-02T10:18:44Z<p>Romanov: </p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it 1 April joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface developed by Microsoft (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying it is Ubuntu running on Windows kernel inside Windows. <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it. <br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Officially release is planned on '''April 11 of 2017''' but RC available now. Starting with this version WSL comes with Ubuntu 16.04 (was 14.04 before). Historically, when WSL appeared in 2016 as an alfa feature, it had a number of bugs like no stubs for DBUS, etc. Now even if it is still a beta, many things where significantly improved and works smoothly (like running X11 apps). <br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but it is recomended to use Windows version 15063+. <br />
<br />
<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=HOWTO_Run_Gluex_Software_on_Windows&diff=81830HOWTO Run Gluex Software on Windows2017-04-02T10:17:42Z<p>Romanov: Created page with "=Introduction= '''Gluex software runs on windows 10 out of the box'''* File:Bash_on_windows_chan_what.jpeg '''Is it April 1st joke?''' - No. It is April 2nd and it is n..."</p>
<hr />
<div>=Introduction=<br />
<br />
'''Gluex software runs on windows 10 out of the box'''*<br />
<br />
[[File:Bash_on_windows_chan_what.jpeg]]<br />
<br />
'''Is it April 1st joke?''' - No. It is April 2nd and it is not a joke. <br />
<br />
<br />
'''How?''' - Windows now comes with a new feature which is known by several names (common for all devilish): "Windows subsystem for linux"('''WSL''') or "Bash on Windows" or "Bash on Ubuntu on Windows" or even "lxrun". <br />
<br />
<br />
WSL provides a Linux-compatible kernel interface developed by Microsoft (containing no Linux kernel code), with user-mode binaries from Ubuntu running on top of it. <br />
<br />
<br />
Roughly saying it is Ubuntu running on Windows kernel inside Windows. <br />
<br />
<br />
Links:<br />
<br />
[https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux Wikipedia article]<br />
<br />
[https://github.com/Microsoft/BashOnWindows GitHub page]<br />
<br />
[https://msdn.microsoft.com/en-us/commandline/wsl/about MSDN page]<br />
<br />
<br />
'''Out of the box''' - WSL now comes with ALL versions of Windows 10. It is just one command to install and activate it. <br />
<br />
<br />
'''WSL vs. Virtual machines vs. Cygwin'''<br />
<br />
* '''Compared to VMs''' - WSL is a thin layer, which addresses linux system calls to NT kernel. Potentially performance is not wasted by virtualization, no problems like space on virtual disks. WSL has full access to windows file system and resources. Actually, it IS Windows running Ubuntu rig. <br />
<br />
* '''Compared to Cygwin''' - Cygwin works in similar way but there is a huge difference. Software is built for cygwin. WSL is '''intended to be binary compatible with Ubuntu'''. And other-compatible too. So one can grab .deb package and use it. Or ctrl+c ctrl+v command from askubuntu.com and it will just work. Because of this gluex software runs on WSL out of the box. <br />
<br />
<br />
<br />
'''Windows 15063.11''' - known as "Creators update". Officially release is planned on '''April 11 of 2017''' but RC available now. Starting with this version WSL comes with Ubuntu 16.04 (was 14.04 before). Historically, when WSL appeared in 2016 as an alfa feature, it had a number of bugs like no stubs for DBUS, etc. Now even if it is still a beta, many things where significantly improved and works smoothly (like running X11 apps). <br />
<br />
GlueX offline software was (roughly) tested on 15063.11. It most probably should work on older versions too, but it is recomended to use Windows version 15063+. <br />
<br />
<br />
<br />
=Installation=<br />
<br />
WSL installation:<br />
<br />
* One has to enable "Developer mode" (type it in the search bar, and click that switch)<br />
<br />
* Open terminal and write ''lxrun /install''. After several questions installation is complete.<br />
<br />
* To run it just print ''bash'' in terminal<br />
<br />
<br />
Because native windows terminal is a shame, it is a good idea to use something like xfce-terminal. To do this:<br />
<br />
* install [https://sourceforge.net/projects/xming/ XMing X11 server]. (don't use VcXsrv, it has problems with OpenGL)<br />
<br />
* put ''export DISPLAY=:0'' in .bashrc <br />
<br />
* install xfce4-terminal (''sudo apt-get install xfce4-terminal'')<br />
<br />
* run it: ''bash -l -c xfce4-terminal''<br />
<br />
[http://askubuntu.com/questions/827952/a-better-terminal-experience-for-windows-subsystem-for-linux Stackoverlow question with more details]<br />
<br />
<br />
'''Gluex prerequesties'''<br />
<br />
This Ubuntu version comes without compilers. Before installing hdpm prerequisites one has to also install build-essential<br />
<br />
> sudo apt install build-essential<br />
<br />
<br />
<br />
'''Gluex software'''<br />
<br />
Just [https://github.com/JeffersonLab/hdpm/wiki/Usage use hdpm as usual] including installation of prerequesties printed by hdpm. <br />
<br />
<br />
=Run GlueX software=<br />
<br />
Features that where verified to work:<br />
<br />
* '''JANA and co''' - HDGeant runs OK. For a sake of experiment, resulting hddm file was moved to some external Windows NTFS disk. Jana was able to process it. <br />
<br />
* '''ROOT''' - root works. All features that where tested (like root files, DSelector) run without problems<br />
<br />
* '''OpenGL''' - Surprisingly, GlueX geometry works even with OpenGL (does it rendered with DirectX under the hood???) <br />
[[File:Bash_on_windows_geometry_opengl.png]]<br />
<br />
* '''Geant 4''' - Has not been tested yet.<br />
<br />
<br />
<br />
'''TL; DR;''' It was tested that our software runs on WSL, which is Ubuntu with windows kernel running on windows.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=File:Bash_on_windows_geometry_opengl.png&diff=81829File:Bash on windows geometry opengl.png2017-04-02T08:24:27Z<p>Romanov: GlueX geometry on windows using OpenGl</p>
<hr />
<div>GlueX geometry on windows using OpenGl</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=File:Bash_on_windows_chan_what.jpeg&diff=81828File:Bash on windows chan what.jpeg2017-04-02T07:06:02Z<p>Romanov: Gluex software works on windows out of the box. Almost.</p>
<hr />
<div>Gluex software works on windows out of the box. Almost.</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_March_22,_2017&diff=81692GlueX Offline Meeting, March 22, 20172017-03-22T15:04:09Z<p>Romanov: </p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, March 22, 2017<br><br />
11:00 am EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## New simple email list: duplicate_cache<br />
## CCDB 1.06.03<br />
## Upcoming SWIF features (Sean)<br />
## MCwrapper changes and progress (Thomas)<br />
# Review of [[GlueX Offline Meeting, March 8, 2017#Minutes|minutes from the last meeting]] (all)<br />
# Plugin linking change proposal (David)<br />
# [https://docs.google.com/presentation/d/1VyREyJygfBNz9pra3-1FIM7YUNo3Q8EopmhhUvAcVW0/edit?usp=sharing|RCDB/CCDB Plans (Dmitry)]<br />
# [[Media:20170322_jana_updates.pdf|JANA 0.7.8]] (David)<br />
# Merging events into a simulated data stream. (Richard)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
#* hdds from ccdb<br />
# Review of [https://groups.google.com/forum/#!forum/gluex-software recent discussion on the Gluex Software Help List].<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_April_27,_2016&diff=74785GlueX Offline Meeting, April 27, 20162016-04-27T17:35:22Z<p>Romanov: </p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, April 27, 2016<br><br />
1:30 pm EDT<br><br />
JLab: CEBAF Center F326/327<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002307.html GCC 4.9.2 on farm] (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002310.html Analysis Scripts Moved] (Paul)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002313.html new git repository: hd_utilities] (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002314.html bug fix release: sim-recon-1.11.1] (Mark)<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002297.html GlueX Software Help Email List] (Mark)<br />
## Other announcements?<br />
# Review of [[GlueX Offline Meeting, April 13, 2016#Minutes|minutes from April 13]] (all)<br />
# [https://halldweb.jlab.org/wiki/images/5/5e/Sdobbs_OfflineMtg_27Apr16.pdf Calibration Challenge/Processing] (Sean)<br />
# Spring 2016 Run, Processing Plans<br />
# Offline Monitoring as we move forward to the fall. <br />
# [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002311.html GlueX TTree DSelector] (Paul)<br />
# [https://mailman.jlab.org/pipermail/halld-offline/2016-April/002312.html Viewing DEPICSvalue with hd_dump] (David)<br />
# Transition to C++11 (David)<br />
# [https://docs.google.com/presentation/d/1_uhBIGhZOFy0v41ihW7RXVvonCy2LjrQ81dOAJ3FcFk/edit?usp=sharing CCDB and RCDB] (Dmitry)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2016</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2016/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Offline_Meeting,_March_30,_2016&diff=74090GlueX Offline Meeting, March 30, 20162016-03-30T18:33:56Z<p>Romanov: /* Agenda */</p>
<hr />
<div>GlueX Offline Software Meeting<br><br />
Wednesday, March 30, 2016<br><br />
3:00 pm EDT<br><br />
JLab: CEBAF Center L207<br />
<br />
==Agenda==<br />
<br />
# Announcements<br />
## [https://mailman.jlab.org/pipermail/halld-offline/2016-March/002275.html sim-recon version 1.10.0 released]<br />
## [https://mailman.jlab.org/pipermail/jlab-scicomp-briefs/2016q1/000121.html Change to Farm Priority Scheme]<br />
# Review of [[GlueX Offline Meeting, March 2, 2016#Minutes|minutes from March 2]] (all)<br />
# [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2984 Reconstruction Scaling] (Paul)<br />
# Calibration Challenge/Processing (Sean)<br />
# NSB not quite zero (Mike S.)<br />
# [https://github.com/JeffersonLab/ccdb/issues/19 CCDB Improvements] ([https://docs.google.com/presentation/d/1wk7SYDhmOdZM0smBDGda4QeonT90Q4SWlhD-7ZA0pME/edit?usp=sharing Slides])(Sean, Dmitry)<br />
<!-- # Geant4 Update (Richard, David) --><br />
# [https://www.jlab.org/conferences/trends2016/program.html Future Trends in Nuclear Physics Computing Workshop] (all)<br />
# Access to Fairshare Information from JLab Farm (all)<br />
# Review of [https://github.com/JeffersonLab/sim-recon/pulls?q=is%3Aopen+is%3Apr recent pull requests] (all)<br />
<!-- # [[Data Challenge 3]] update (Mark) --><br />
# Action Item Review<br />
<br />
==Communication Information==<br />
<br />
===Remote Connection===<br />
<br />
* The BlueJeans meeting number is 968 592 007 .<br />
* [http://bluejeans.com/968592007 Join the Meeting] via BlueJeans<br />
<br />
===Slides===<br />
<br />
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2016</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2016/ .</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_Conditions_Help&diff=73405RCDB Conditions Help2016-02-29T20:20:20Z<p>Romanov: Added polarimeter_converter condition</p>
<hr />
<div>This page gives definitions of the condition variables stored in the RCDB as of RunPeriod-2016-02.<br />
<br />
{| class="wikitable"<br />
! Condition Name !! EPICS variable !! Notes <br />
|-<br />
| beam_current || IBCAD00CRCUR6 || Beam current from electron dump (?) BCM, averaged over run, in nA<br />
|-<br />
| beam_energy || HALLD:p || Measured electron beam energy, in GeV<br />
|-<br />
| coherent_peak || HD:CBREM:REQ_EDGE || Requested high-energy edge of the coherent peak, in GeV<br />
|-<br />
| collimator_diameter || hd:collimator_at_[block,a,b] || 5 mm, 3.4 mm, blocking<br />
|-<br />
| luminosity || N/A || For future use<br />
|-<br />
| ps_converter || hd:converter_at_[home,a,b,c] || 5 x 10<sup>-3</sup> RL, 1 x 10<sup>-3</sup> RL, 3 x 10<sup>-4</sup> RL, Retracted<br />
|-<br />
| solenoid_current || HallD-PXI:Data:I_Shunt || Measured current in solenoid magnet, in A<br />
|-<br />
| status || N/A || For future use<br />
|-<br />
| radiator_index || HD:GONI:RADIATOR_INDEX || Goniometer radiator position index, as described [https://halldsvn.jlab.org/repos/trunk/controls/epics/app/goniApp/Db/goni.substitutions in this page].<br />
|-<br />
| radiator_id || HD:GONI:RADIATOR_ID || Goniometer radiator unique ID, as described [https://halldsvn.jlab.org/repos/trunk/controls/epics/app/goniApp/Db/goni.substitutions in this page].<br />
|-<br />
| polarization_direction || HD:CBREM:PLANE || Direction of the photon beam linear polarization with respect to the floor: PARA, PERP, or UNKNOWN<br />
|-<br />
| radiator_type || HD:GONI:RADIATOR_NAME OR hd:radiator_at_[a,b,c] || Goniometer radiator name, as described [https://halldsvn.jlab.org/repos/trunk/controls/epics/app/goniApp/Db/goni.substitutions in this page], Amorphous radiator (2 x 10<sup>-5</sup> RL, 1 x 10<sup>-4</sup> RL, 3 x 10<sup>-4</sup> RL), or retracted<br />
|-<br />
| target_type || HLD:TGT:status || Status of LH2 target: OFF, Cooling, Filling, FULL & Ready, Emptying, EMPTY & Ready<br />
|-<br />
| polarimeter_converter || hd:polarimeter_at_* || Polarimeter's converter. Values: RETRACTED/Be 50um/Be 75um/Be 750um<br />
|}<br />
<br />
'''Other Notes'''<br />
<br />
* When the nA BPMs are calibrated, we could use another variable for the beam current that might be more accurate for lower (<1 uA) currents, e.g. IPM5C11.VAL,IPM5C11A.VAL,IPM5C11B.VAL,IPM5C11C.VAL</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_Standard_Searches&diff=73281RCDB Standard Searches2016-02-24T18:27:39Z<p>Romanov: /* New */</p>
<hr />
<div>= New = <br />
<br />
Suggested update:<br />
<br />
<syntaxhighlight lang="python"><br />
@is_production = (run_type == 'hd_all.tsg' or run_type == 'hd_all.tsg.ps') and # Production Run? Should double-check this with Sascha. It might not be true for mode 8?<br />
daq_run == 'PHYSICS' and <br />
beam_current > 2 and <br />
num_events > 100000 and <br />
(radiator_id !=5 or radiator_id == 0) and # diamond or amorphous radiator is in<br />
solenoid_current > 100<br />
<br />
<br />
@is_cosmic = run_type == 'hd_all.tsg_cosmic' and<br />
'COSMIC' in daq_run and<br />
beam_current < 1.<br />
<br />
<br />
@is_empty_target = target_type == "EMPTY & Ready" # Empty Target<br />
<br />
<br />
@is_amorph_radiator = radiator_id == 0 and target_type == "FULL & Ready" # Amorphous Radiator<br />
<br />
<br />
@is_coherent_beam = radiator_id != 5 and radiator_id != 0 and target_type == "FULL & Ready" # Coherent Beam<br />
<br />
<br />
@is_field_off = solenoid_current < 100 # Field Off<br />
<br />
<br />
@is_field_on = solenoid_current >= 100 # Field On<br />
<br />
<br />
@status_approved = status == 1<br />
<br />
<br />
@status_unchecked = status == -1<br />
<br />
<br />
@status_reject = status == 0<br />
<br />
</syntaxhighlight><br />
<br />
[https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb/python/rcdb/alias.py The exact file with aliases (alias.py)]<br />
<br />
= Old = <br />
<br />
Goal: Classify and search for non-DAQ test runs<br />
<br />
* Type: Production Run<br />
** run_type == hd_all.tsg<br />
** daq_run == PHYSICS<br />
** beam_current > 2.<br />
** num_events > 10000<br />
** radiator id !=5 # NOT RETRACTED<br />
** solenoid_current > 100. (?) <br />
** Subtype: Empty Target<br />
*** target_type == "EMPTY & Ready"<br />
** Subtype: Amorphous Radiator<br />
*** radiator_id = 0<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Coherent Beam<br />
*** radiator_id != 5 AND radiator_id != 0.<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Field Off<br />
*** solenoid_current < 100.<br />
<br />
Corresponding RCDB queries:<br />
<syntaxhighlight lang="python"><br />
@is_production = run_type == 'hd_all.tsg' and # Production Run<br />
daq_run == 'PHYSICS' and <br />
beam_current > 2 and <br />
num_events > 10000 and <br />
radiator_id !=5 and <br />
solenoid_current > 100<br />
<br />
<br />
@is_cosmic = run_type == 'hd_all.tsg_cosmic' and<br />
'COSMIC' in daq_run and<br />
beam_current < 1.<br />
<br />
<br />
@is_empty_target = target_type == "EMPTY & Ready" # Empty Target<br />
<br />
<br />
@is_amorph_radiator = radiator_id == 0 and target_type == "FULL & Ready" # Amorphous Radiator<br />
<br />
<br />
@is_coherent_beam = radiator_id != 5 and radiator_id != 0 # Coherent Beam<br />
<br />
<br />
@is_field_off = solenoid_current < 100 # Field Off<br />
<br />
<br />
@is_field_on = solenoid_current >= 100 # Field On<br />
<br />
<br />
@status_approved = status == 1<br />
<br />
<br />
@status_unchecked = status == -1<br />
<br />
<br />
@status_reject = status == 0<br />
<br />
</syntaxhighlight><br />
<br />
<br />
* status flag<br />
** unchecked = -1<br />
** rejected = 0<br />
** approved = 1<br />
<br />
<br />
* Type: Cosmic run<br />
** run_type == hd_all.tsg_cosmic<br />
** daq_run == COSMIC*<br />
** beam_current < 1.<br />
** Subtype: Magnet off<br />
*** solenoid_current < 100.<br />
** Subtype: Magnet on<br />
*** solenoid_current > 100.<br />
<br />
* Other types?<br />
** Normalization run (super-low-current, tagger trigger, TAC in, definitions need input from Somov)<br />
** PS-only? (needs input from Somov)<br />
** LED Pulser only?</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX-Collaboration-Feb-2016&diff=73215GlueX-Collaboration-Feb-20162016-02-19T22:02:27Z<p>Romanov: /* Friday February 19, 2016 */</p>
<hr />
<div>== GlueX Collaboration Meeting ==<br />
<br />
<font size="+1">February 18 to 20, 2016 at Jefferson Lab</font><br />
<br />
The following template has been set up to allow people to identify what needs to be presented at the meeting. It has been roughly broken out by working group with suggestions for topics within each group. The working group chairs should work on adding the relevant talks with an estimated time (including questions) and the speaker's name. If there is a talk that does not fit into this template, please add it at the bottom.<br />
<br />
== Registration ==<br />
Everyone participating in the collaboration meeting whether in person at JLab or remotely via electronic media are encouraged to register. Please visit the<br />
[https://misportal.jlab.org/Ul/conferences/generic_conference/registration.cfm?conference_id=COLLAB-GLUEX-FEB2016 Registration Page].<br />
<br />
To see who else will be attending either in person or via electronic media, please see the<br />
[https://misportal.jlab.org/Ul/conferences/generic_conference/participants.cfm?conference_id=COLLAB-GLUEX-FEB2016 Current List of Participants Page]<br />
<br />
== Location ==<br />
CEBAF Center F113<br />
<br />
== Remote Access ==<br />
# To join via a Web Browser, go to the page [https://bluejeans.com/660743227] https://bluejeans.com/660743227.<br />
# To join via Polycom room system go to the IP Address: 199.48.152.152 ([http://bjn.vc bjn.vc]) and enter the meeting ID: 660743227.<br />
# To join via phone, use one of the following numbers and the Conference ID: 660743227.<br />
#* US or Canada: +1 408 740 7256 or <br />
#* US or Canada: +1 888 240 2560<br />
# More information on connecting to [[Connect to the Data Challenge Meetings|bluejeans]] is available.<br />
<br />
== Talks on the DocDB ==<br />
<br />
The DocDB talk-upload template for the October 2015 Collaboration Meeting can be found here:<br />
<br />
http://argus.phys.uregina.ca/cgi-bin/private/DocDB/DisplayMeeting?conferenceid=47<br />
<br />
'''Upload your talks at the link above and then edit the agenda below to enter the appropriate link to your talk based on its DocDB document number'''. <br />
<br />
Example (from the May 2008 meeting):<br />
* 15:15 Session I (90) --- Offline Working Group Meeting --- Chair Curtis Meyer<br />
** (20) [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=1042 Offline Software Status] -- D. Lawrence<br />
<br />
-----<br />
<br />
== AGENDA == <br />
<br />
== Thursday February 18, 2016 ==<br />
* 8:30 Session Ia (110) --- Opening - Chair: Matt Shepherd<br />
** 8:30 (5) --- Welcome --- Curtis Meyer<br />
** 8:35 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2933 Hall-D Update] --- Eugene Chudakov<br />
** 9:00 (35) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2930 Fall 2015 and Spring 2016 Runs] --- Alexandre Deur<br />
** 9:35 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2932 Accelerator Update] --- Todd Satogata<br />
** 10:00 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2925 Engineering Update] --- Tim Whitlatch<br />
** 10:20 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2929 Electronics Update] -Fernando Barbosa<br />
* 10:45 (30) Coffee<br />
* 11:15 Session Ib (100) --- DAQ/Trigger/Electronics - (Organizer:David L. ) Chair: - Justin Stevens<br />
<br />
** 11:15 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2928 FA125 Firmware] --- Naomi Jarvis<br />
** 11:35 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2935 DAQ Status] --- Sergey Furletov<br />
** 11:55 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2936 L1 Trigger Status] --- Alex Somov<br />
** 12:15 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2934 Controls Status] --- Hovanes Egiyan<br />
** 12:35 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2927 Online/L3 Status] --- David Lawrence<br />
* 12:55 (95) Lunch <br />
* 13:00 (70) Collaboration Board Meeting<br />
* 14:30 Session IIa (180) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2931 Beamline and Tagger I] - (Organizer: R. Jones ) - Chair: - Elton Smith<br />
** 14:30 (30) --- Background conditions in tagger hall and collimator cave --- Alexandre Deur<br />
** 15:00 (25) --- Fast feedback control system --- Trent Allison<br />
** 15:25 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2940 Status of the pair spectrometer] --- Alex Somov<br />
** 15:50 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2937 Status of the tagger hodoscope] --- Franz Klein<br />
** 16:15 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2939 Status of the tagger microscope] --- Alex Barnes<br />
* 16:40 (20) Coffee<br />
** 17:00 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2938 Status of the polarimeter] --- Michael Dugger<br />
** 17:25 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2941 Status of thin diamond fabrication and assessment] --- Brendan Pratt, Richard Jones<br />
* 18:15 --- Reception in the Atrium<br />
<br />
== Friday February 19, 2016 ==<br />
* 9:30 Session IIIa (100) --- Calorimeters - (Organizer: Zisis Papandreou) - Chair: Sean Dobbs<br />
** 9:30 (25) --- [http://argus.phys.uregina.ca/gluex/DocDB/0029/002945/001/BCALUpdate_20160219.pdf BCAL Calibration Update] - Zisis Papandreou<br />
** 9:55 (25) --- [http://argus.phys.uregina.ca/gluex/DocDB/0029/002944/001/BCAL_Analyses_Update_Feb19.pdf BCAL Analyses] - Will McGinley<br />
**10:20 (25) --- [http://argus.phys.uregina.ca/gluex/DocDB/0029/002943/002/FCAL_collab_meeting_feb19.pdf FCAL Update] - Adesh Subedi<br />
**10:40 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2920 FCAL Insert] - Liping Gan <br />
* 11:05 (20) Coffee<br />
* 11:25 Session IIIb (100) --- Tracking - (Organizer: Naomi Jarvis) - Chair: Paul Mattione<br />
** 11:25 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2947 CDC] -- Mike Staib<br />
** 11:50 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2942 Tracking] -- Simon Taylor<br />
** 12:15 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2948 TRD] -- Sergey Furletov<br />
* 12:40 Lunch (80)<br />
* 14:00 Session IVa (100) --- Particle ID - (Organizer: Justin Stevens ) - Chair: Mark Dalton<br />
** 14:00 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2949 Time-of-Flight Status] -- Sasha Ostrovidov<br />
** 14:25 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2950 Start Counter Status] -- Mahmoud Kamel<br />
** 14:50 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2946 DIRC I] - John Hardin<br />
** 15:15 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2924 DIRC II] - Maria Patsyuk<br />
* 15:40 Coffee (30)<br />
* 16:10 Session IVb (125) --- Offline/Analysis (Organizer: Mark Ito ) Chair: Volker Crede<br />
** 16:10 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2926 Offline Software Update] - Mark Ito<br />
** 16:30 (20) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2952 Offline Monitoring] - Paul Mattione<br />
** 16:50 (35) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2951 Calibration Update] - Sean Dobbs<br />
** 17:25 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2953 hdpm] - Nathan Sparks<br />
** 17:50 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2954 Databases] - Dmitry Romanov<br />
* 18:15 Adjourn<br />
<br />
== Saturday February 20, 2016 ==<br />
* 9:00 Session Va (120) --- Physics I - (Organizer: Volker Crede) - Chair: Curtis Meyer<br />
** 9:00 (30) --- Report from K<SUB>L</SUB> Workshop - Moskov Amaryon<br />
** 9:30 (25) --- Plans for a Dalitz-plot analysis of &omega; &rarr; 3&pi; and &phi; &rarr; 3&pi; - Alyssa Henderson<br />
** 10:55 (25) --- [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2915 A survey of multi-photon final states] - Simon Taylor <br />
** 10:20 (25) --- Updates on p&eta;' and the &pi;<SUP>0</SUP> beam asymmetry - Eric Pooser / Tegan Beattie<br />
** 10:45 (25) --- Preparations for the pion polarizability measurement - Rory Miskimen<br />
* 11:10 (20) Coffee<br />
* 11:30 Session Vb (30) --- Physics II - (Organizer: Volker Crede) - Chair: Mark Ito<br />
** 11:20 (15) --- [[GlueX_Physics_Workshop_2016|The May Physics / Analysis Workshop]] - Paul Mattione<br />
** 11:35 (20) --- Discussion on physics analysis and first publications - Curtis Meyer<br />
* 11:55 Session Vc (30) --- Business Meeting - Chair: Matt Shepherd<br />
** 11:55 (20) --- Report from the collaboration board - Volker Crede<br />
** 12:15 (15) --- Moving forward and closeout - Curtis Meyer<br />
* 12:30 Adjourn</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_Standard_Searches&diff=73188RCDB Standard Searches2016-02-18T19:25:09Z<p>Romanov: </p>
<hr />
<div>Goal: Classify and search for non-DAQ test runs<br />
<br />
* Type: Production Run<br />
** run_type == hd_all.tsg<br />
** daq_run == PHYSICS<br />
** beam_current > 2.<br />
** num_events > 10000<br />
** radiator id !=5 # NOT RETRACTED<br />
** solenoid_current > 100. (?) <br />
** Subtype: Empty Target<br />
*** target_type == "EMPTY & Ready"<br />
** Subtype: Amorphous Radiator<br />
*** radiator_id = 0<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Coherent Beam<br />
*** radiator_id != 5 AND radiator_id != 0.<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Field Off<br />
*** solenoid_current < 100.<br />
<br />
Corresponding RCDB queris:<br />
<syntaxhighlight lang="python"><br />
@is_production = run_type == 'hd_all.tsg' and # Production Run<br />
daq_run == 'PHYSICS' and <br />
beam_current > 2 and <br />
num_events > 10000 and <br />
radiator_id !=5 and <br />
solenoid_current > 100<br />
<br />
<br />
@is_cosmic = run_type == 'hd_all.tsg_cosmic' and<br />
'COSMIC' in daq_run and<br />
beam_current < 1.<br />
<br />
<br />
@is_empty_target = target_type == "EMPTY & Ready" # Empty Target<br />
<br />
<br />
@is_amorph_radiator = radiator_id == 0 and target_type == "FULL & Ready" # Amorphous Radiator<br />
<br />
<br />
@is_coherent_beam = radiator_id != 5 and radiator_id != 0 # Coherent Beam<br />
<br />
<br />
@is_field_off = solenoid_current < 100 # Field Off<br />
<br />
<br />
@is_field_on = solenoid_current >= 100 # Field On<br />
<br />
<br />
@status_approved = status == 1<br />
<br />
<br />
@status_unchecked = status == -1<br />
<br />
<br />
@status_reject = status == 0<br />
<br />
</syntaxhighlight><br />
<br />
<br />
* status flag<br />
** unchecked = -1<br />
** rejected = 0<br />
** approved = 1<br />
<br />
<br />
* Type: Cosmic run<br />
** run_type == hd_all.tsg_cosmic<br />
** daq_run == COSMIC*<br />
** beam_current < 1.<br />
** Subtype: Magnet off<br />
*** solenoid_current < 100.<br />
** Subtype: Magnet on<br />
*** solenoid_current > 100.<br />
<br />
* Other types?<br />
** Normalization run (super-low-current, tagger trigger, TAC in, definitions need input from Somov)<br />
** PS-only? (needs input from Somov)<br />
** LED Pulser only?</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_Standard_Searches&diff=73187RCDB Standard Searches2016-02-18T19:21:22Z<p>Romanov: </p>
<hr />
<div>Goal: Classify and search for non-DAQ test runs<br />
<br />
* Type: Production Run<br />
** run_type == hd_all.tsg<br />
** daq_run == PHYSICS<br />
** beam_current > 2.<br />
** num_events > 10000<br />
** radiator id !=5 # NOT RETRACTED<br />
** solenoid_current > 100. (?) <br />
** Subtype: Empty Target<br />
*** target_type == "EMPTY & Ready"<br />
** Subtype: Amorphous Radiator<br />
*** radiator_id = 0<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Coherent Beam<br />
*** radiator_id != 5 AND radiator_id != 0.<br />
*** target_type == "FULL & Ready"<br />
** Subtype: Field Off<br />
*** solenoid_current < 100.<br />
<br />
Corresponding RCDB queris:<br />
<syntaxhighlight lang="python" line="1" ><br />
@is_production = run_type == 'hd_all.tsg' and # Production Run<br />
daq_run == 'PHYSICS' and <br />
beam_current > 2 and <br />
num_events > 10000 and <br />
radiator_id !=5 and <br />
solenoid_current > 100<br />
<br />
@is_cosmic = run_type == 'hd_all.tsg_cosmic' and<br />
'COSMIC' in daq_run and<br />
beam_current < 1.<br />
<br />
@is_empty_target = target_type == "EMPTY & Ready" # Empty Target<br />
<br />
@is_amorph_radiator = radiator_id == 0 and target_type == "FULL & Ready" # Amorphous Radiator<br />
<br />
@is_coherent_beam = radiator_id != 5 and radiator_id != 0 # Coherent Beam<br />
<br />
@is_field_off = solenoid_current < 100 # Field Off<br />
<br />
@is_field_on = solenoid_current >= 100 # Field On<br />
<br />
@status_approved = status == 1<br />
<br />
@status_unchecked = status == -1<br />
<br />
@status_reject = status == 0<br />
<br />
</syntaxhighlight><br />
<br />
<br />
* status flag<br />
** unchecked = -1<br />
** rejected = 0<br />
** approved = 1<br />
<br />
<br />
* Type: Cosmic run<br />
** run_type == hd_all.tsg_cosmic<br />
** daq_run == COSMIC*<br />
** beam_current < 1.<br />
** Subtype: Magnet off<br />
*** solenoid_current < 100.<br />
** Subtype: Magnet on<br />
*** solenoid_current > 100.<br />
<br />
* Other types?<br />
** Normalization run (super-low-current, tagger trigger, TAC in, definitions need input from Somov)<br />
** PS-only? (needs input from Somov)<br />
** LED Pulser only?</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=GlueX_Shifts&diff=73166GlueX Shifts2016-02-17T20:15:04Z<p>Romanov: /* Electronic Logbook */ Updated a link to the logbook</p>
<hr />
<div>__TOC__<br />
<br />
= Shift Schedule =<br />
* [http://www.jlab.org/Hall-D/shifts/ Shift Schedule]<br />
<br />
= Contact Information =<br />
* MCC Crew Chief: (757) 269-7045 (Cell Phone: (757) 630-7050)<br />
* Lab Security: (757) 269-5822<br />
* HallD/GlueX Run Coordinator:<br />
* [http://www.jlab.org/accel/RadCon/ RadCon Page] Phone (757) 269-7216 (Cell Phone (757) 876-1743)<br />
* [http://www.jlab.org/accel/ops/ops_liaison/Hall_D/hd.html Accelerator Liaison Page]<br />
<br />
= Shift Activities =<br />
=== Electronic Logbook ===<br />
* [https://logbooks.jlab.org/book/hdlog Electronic Log Book]<br />
<br />
=== Run Control ===<br />
* Start a Run<br />
* Stop a Run<br />
* Pause a Run<br />
* Resume a Run<br />
=== Calibration Runs ===<br />
<br />
=== Monitoring Detector Operation ===<br />
<br />
=== Monitoring Data Quality ===<br />
<br />
= Information for Shift Taker's =<br />
[https://halldweb.jlab.org/hdops/wiki Documentation Wiki]<br />
<br />
= Safety Information =</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66248RCDB conditions python2015-04-17T08:12:04Z<p>Romanov: /* Installation */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<hdops_pwd>@gluondb1/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' flags are to be used. It shows condition names, types and descriptions (if exists):<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get names only use '''--list-names''':<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
=== Getting value by the run number===<br />
To see all conditions and values for a run:<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Add name to get value of the only condition:<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without '''--replace''' error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66214RCDB conditions python2015-04-15T21:40:47Z<p>Romanov: /* Getting condition names and info */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' flags are to be used. It shows condition names, types and descriptions (if exists):<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get names only use '''--list-names''':<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
=== Getting value by the run number===<br />
To see all conditions and values for a run:<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Add name to get value of the only condition:<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without '''--replace''' error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66213RCDB conditions python2015-04-15T21:39:34Z<p>Romanov: /* Getting values */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' with its types and descriptions (if exists)<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get conditions with types list and '''--list-names'''<br />
<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Getting value by the run number===<br />
To see all conditions and values for a run:<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Add name to get value of the only condition:<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without '''--replace''' error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66212RCDB conditions python2015-04-15T21:38:34Z<p>Romanov: /* Writing data */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' with its types and descriptions (if exists)<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get conditions with types list and '''--list-names'''<br />
<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Getting values ===<br />
To see all conditions and values for a run type<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Give run number and name to get value<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without '''--replace''' error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66211RCDB conditions python2015-04-15T21:34:59Z<p>Romanov: </p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' with its types and descriptions (if exists)<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get conditions with types list and '''--list-names'''<br />
<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Getting values ===<br />
To see all conditions and values for a run type<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Give run number and name to get value<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without '''--replace''' error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66210RCDB conditions python2015-04-15T21:33:31Z<p>Romanov: /* Command line tools */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
Command line tools provide less possibilities for data manipulation than python API at the moment. <br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
More information and examples are in [[#Command line tools]] section below.<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' with its types and descriptions (if exists)<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get conditions with types list and '''--list-names'''<br />
<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Getting values ===<br />
To see all conditions and values for a run type<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Give run number and name to get value<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without --replace error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66209RCDB conditions python2015-04-15T21:29:39Z<p>Romanov: Command line tools added</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
There is a small handy command line tool '''rcnd''', that allows to see RCDB conditions and write values<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
rcnd --write 1663 100 event_count # Write condition value to run 100<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions are not designed for? - They are not designed for large data sets that change rarely (value is the same for many runs).<br />
That is because each condition value is independently saved (and attached) for each run.<br />
<br />
In the case of bulk data, it is better to save it using other RCDB options. RCDB provides the files saving mechanism as example.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW examples ==<br />
<br />
===Python===<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
=== Command line tools ===<br />
(!)Command line tool provides less possibilities for data manipulation than python API.<br />
<br />
<syntaxhighlight lang="bash"><br />
export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
rcnd --help # Gives you self descriptive help<br />
rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line instead of environment<br />
rcnd # Gives database statistics, number of runs and conditions<br />
rcnd 1000 # See all recorded values for run 1000<br />
rcnd 1000 event_count # See exact value of 'event_count' for run 1000<br />
<br />
# Creating condition type (need to be done once)<br />
rcnd --create my_value --type string --description "This is my value"<br />
<br />
# Write value for run 1000 for condition 'my_value'<br />
rcnd --write "value to write" --replace 1000 my_value<br />
<br />
# See all condition names and types in DB<br />
rcnd --list<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
<br />
<br />
<br />
<br />
== Command line tools ==<br />
While ccdb like shell is still in progress, you can introspect and manipulate with run conditions using '''rcnd''' tool.<br />
The tool is added to the PATH after environment.bash(csh) from RCDB_HOME folder is sourced. It is, actually, placed<br />
in the same place as the environment.bash.<br />
<br />
<br />
(!) '''rcnd''' doesn't offer all possible data manipulations<br />
<br />
<br />
<br />
<syntaxhighlight lang="bash"><br />
> export RCDB_CONNECTION=mysql://rcdb@localhost/rcdb<br />
> rcnd --help # Gives you self descriptive help<br />
> rcnd -c mysql://rcdb@localhost/rcdb # -c flag sets connection string from command line<br />
> rcnd # Gives database statistics, number of runs and conditions<br />
</syntaxhighlight><br />
<br />
Output<br />
<pre><br />
Runs total: 1387<br />
Last run : 2472<br />
Condition types total: 9<br />
Conditions:<br />
<br />
components<br />
component_stats<br />
...<br />
</pre><br />
<br />
<br />
<br />
=== Getting condition names and info ===<br />
<br />
To get all conditions '''-l''' or '''--list''' with its types and descriptions (if exists)<br />
<pre><br />
> rcnd -l<br />
components (json)<br />
component_stats (json)<br />
event_count (int) - Run events count<br />
event_rate (float) - Events per sec.<br />
...<br />
</pre><br />
<br />
<br />
To get conditions with types list and '''--list-names'''<br />
<br />
<pre><br />
> rcnd --list-names<br />
components<br />
component_stats<br />
event_count<br />
event_rate<br />
...<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Getting values ===<br />
To see all conditions and values for a run type<br />
<br />
<pre><br />
> rcnd 1000 # See all recorded values for run 1000<br />
components = (json){"ROCBCAL2": "ROC", "ROCBCAL3": "ROC", "ROCBCAL1":...<br />
component_stats = (json){"ROCBCAL2": {"evt-number": 487, "data-rate": 300....<br />
event_count = 487<br />
rtvs = (json){"%(CODA_ROL1)": "/home/hdops/CDAQ/daq_dev_v0.31/d...<br />
run_config = 'pulser.conf'<br />
run_type = 'hd_bcal_n.ti'<br />
...<br />
</pre><br />
<br />
<br />
Give run number and name to get value<br />
<pre><br />
> rcnd 1000 event_count<br />
487<br />
<br />
> rcnd 1000 components<br />
{"ROCBCAL2": "ROC", "ROCBCAL3": "ROC"}<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Writing data ===<br />
<br />
Creating condition type (need to be done once):<br />
<br />
<pre><br />
> rcnd --create my_value --type string --description "This is my value"<br />
ConditionType created with name='my_value', type='string', is_many_per_run='False'<br />
</pre><br />
<br />
Where --type is:<br />
<br />
* bool, int, float, string - basic types. float is the default<br />
* json - to store arrays or custom objects<br />
* time - to store just time. (You can alwais add time information to any other type)<br />
* blob - binary blob. Don't use it if possible<br />
<br />
<br />
Names policy (not strict at all):<br />
<br />
# Don't use spaces. Use '_' instead<br />
<br />
# Full words are better. So 'event_count' is better than evt_cnt<br />
<br />
# Max name is 255 character. But please, make them shorter<br />
<br />
<br />
<br />
Write value for run 1000 for condition 'my_value'<br />
<br />
<pre><br />
> rcnd --write "value to write" --replace 1000 my_value<br />
Written 'my_value' to run number 1000<br />
</pre><br />
<br />
Without --replace error is raised, if run 1000 already have different value for 'my_value'<br />
<br />
<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=66012RCDB conditions python2015-04-06T18:15:38Z<p>Romanov: /* Introduction */</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
import rcdb<br />
<br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions is not designed for? - For large data sets that change rarely over runs (value is the same for many runs).<br />
In the case, it is better to save such data using other RCDB tools rather than RCDB conditions. For RCDB conditions each value is independently saved and attached to a run.<br />
RCDB provides a File Saving mechanism for such kind of bulk data. Or maybe CCDB fits better in this case.<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW example ==<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=64044RCDB conditions python2015-03-07T17:40:00Z<p>Romanov: </p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions is not designed for? - If data is bulky and changes rarely (value is the same for many runs),<br />
it is better not to save it using conditions. That is because each value is independently saved and attached to a run.<br />
RCDB provides file saving mechanism for such kind of bulk data. Or maybe CCDB fits better in this case.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW example ==<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as <br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=64043RCDB conditions python2015-03-07T17:38:43Z<p>Romanov: Performance section added</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions is not designed for? - If data is bulky and changes rarely (value is the same for many runs),<br />
it is better not to save it using conditions. That is because each value is independently saved and attached to a run.<br />
RCDB provides file saving mechanism for such kind of bulk data. Or maybe CCDB fits better in this case.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW example ==<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme.<br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to<br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If<br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is<br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples.<br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Performance ==<br />
<br />
<br />
<br />
<br />
=== Reusing objects ===<br />
<br />
<br />
Most of the API functions (like <code>add_condition(...)</code> or <code>get_condition(...)</code>) can accept model objects as run<br />
parameters:<br />
<br />
<syntaxhighlight lang="python"><br />
# 1. Using run number and condition name<br />
db.add_condition(1, "my_value", 10)<br />
<br />
# 2. Using model objects<br />
run = db.get_run(1)<br />
ct = db.get_condition_type("my_value")<br />
db.add_condition(run, ct, 10)<br />
</syntaxhighlight><br />
<br />
<br />
When you do <code>db.add_condition(1, "my_value", 10)</code> condition type and run are queried inside a function. If you do several actions with one object, like adding many conditions for one run or adding one condition to many runs, reusing the object could boost performance up to 30% each. <br />
<br />
<br />
<br />
<br />
<br />
=== Auto commit value addition===<br />
Performance study shows, that approximately 50% of the time spent in <code>add_condition(...)</code> is used to commit changes to DB. <br />
<br />
To speed up conditions addition <code>add_condition(...)</code> function has '''auto_commit''' optional argument. <br />
By default it is '''True''', changes are committed to DB, if ''add_condition'' call is successful. <br />
Setting ''auto_commit''='''False''' allows to defer commit, changes are pending in SQLAlchemy cache and can be committed <br />
manually later.<br />
<br />
<br />
''auto_commit''='''False''' purposes are:<br />
<br />
* Make a lot of changes and commit them at one time gaining performance<br />
* Rollback changes<br />
<br />
<br />
To commit changes, having <code>db = RCDBProvider(...)</code> you should call <code>db.session.commit()</code> <br />
<br />
<br />
<syntaxhighlight lang="python"><br />
""" Test auto_commit feature that allows to commit changes to DB later"""<br />
ct = self.db.create_condition_type("ac", ConditionType.INT_FIELD, False)<br />
<br />
# Add condition to addition but don't commit changes<br />
self.db.add_condition(1, ct, 10, auto_commit=False)<br />
<br />
# But the object is selectable already<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
<br />
# Commit session. Now "ac"=10 is stored in the DB<br />
self.db.session.commit()<br />
<br />
# Now we deffer committing changes to DB. Object is in SQLAlchemy cache<br />
self.db.add_condition(1, ct, 20, None, True, False)<br />
self.db.add_condition(1, ct, 30, None, True, False)<br />
<br />
# If we select this object, SQLAlchemy gives us changed version<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 30)<br />
<br />
# Roll back changes<br />
self.db.session.rollback()<br />
val = self.db.get_condition(1, ct)<br />
self.assertEqual(val.value, 10)<br />
</syntaxhighlight><br />
<br />
<br />
The example is available in tests:<br />
<br />
<pre><br />
$RCDB_HOME/python/tests/test_conditions.py<br />
</pre><br />
<br />
<br />
(!) note at the same time, that more complex scenarios with not committed objects haven't been tested.<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanovhttps://halldweb1.jlab.org/wiki/index.php?title=RCDB_conditions_python&diff=64015RCDB conditions python2015-03-07T02:26:16Z<p>Romanov: </p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Run conditions is the way to store information related to a run (which is identified by run_number everywhere).<br />
From a simplistic point of view, run conditions are presented in RCDB as '''name'''-'''value''' pairs attached to a<br />
run number. For example, '''event_count''' = '''1663''' for run '''100'''.<br />
<br />
<br />
More versatile options of conditions include:<br />
<br />
* A condition can also hold a time information of occurrence '''name - value (+time)'''<br />
* Several values could be attached by the same name to the same run. So it looks like '''name''' - '''[(value1, time1), (value2, time2), ... ]'''<br />
* As opposite, API can ensure that there in strictly one value per run<br />
* Different types of values are supported<br />
<br />
<br />
This tutorial covers RCDB conditions python API, which provides complete tooling for conditions management.<br />
The API is developed using SQLAlchemy ORM, which unifies workflow for MySQL and SQLite databases<br />
(and many more, actually). RCDB API hides many complexities of SQLAlchemy and provides simple and very<br />
straightforward functions to manage conditions. But users can use all power of SQLAlchemy for querying and<br />
filtering results if they wish.<br />
<br />
<br />
Lets see how python code would look for the example above. Read event_count for run 100:<br />
<br />
<syntaxhighlight lang="python"><br />
# Open SQLite database connection<br />
db = rcdb.RCDBProvider("sqlite:///path.to.file.db")<br />
<br />
# Read value for run 100<br />
event_count = db.get_condition(100, "event_count").value<br />
</syntaxhighlight><br />
<br />
<br />
Write ''event_count''=''1663'' for run ''100'':<br />
<br />
<syntaxhighlight lang="python"><br />
# Once in a lifetime, create a condition type, that defines event_count<br />
ct = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
<br />
# Write condition value to run 100<br />
db.add_condition(100, "event_count", 1663)<br />
</syntaxhighlight><br />
<br />
<br />
What RCDB conditions is not designed for? - If data is bulky and changes rarely (value is the same for many runs),<br />
it is better not to save it using conditions. That is because each value is independently saved and attached to a run.<br />
RCDB provides file saving mechanism for such kind of bulk data. Or maybe CCDB fits better in this case.<br />
<br />
<br />
<br />
== Installation ==<br />
<br />
1. '''Get rcdb'''.<br />
<br />
RCDB svn is:<br />
<br />
https://halldsvn.jlab.org/repos/trunk/online/daq/rcdb/rcdb<br />
<br />
<br />
2. '''Set environment'''.<br />
<br />
There are *environment.bash* or *environment.csh* scripts, which automatically set<br />
environment variables for the of rcdb<br />
<br />
<syntaxhighlight lang="bash"><br />
source environment.bash<br />
</syntaxhighlight><br />
<br />
The script:<br />
<br />
* sets '''$RCDB_HOME''' - to RCDB root directory,<br />
* appends '''$PYTHONPATH''' with $RCDB_HOME/python<br />
* appends '''$PATH''' with rcdb bin folder<br />
<br />
<br />
3.'''Choose database'''<br />
<br />
The main database is considered to be MySQL in counting house. The connection string is:<br />
<br />
<pre><br />
mysql://rcdb:<whell_known_pwd>@gluondb/rcdb<br />
</pre><br />
<br />
SQLite database snapshot is also available at:<br />
<br />
<pre><br />
/u/group/halld/Software/rcdb<br />
</pre><br />
<br />
<br />
To experiment with RCDB and examples below, there is create_empty_sqlite.py script in $RCDB_HOME/python folder.<br />
The script creates empty sqlite database. The usage is:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py path_to_database.db<br />
</syntaxhighlight><br />
<br />
<br />
<br />
== ALL YOU HAVE TO KNOW example ==<br />
At least to start with RCDB conditions, to put values and to get them back:<br />
<br />
<syntaxhighlight lang="python"><br />
from datetime import datetime<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# 1. Create RCDBProvider object that connects to DB and provide most of the functions<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# 2. Create condition type. It is done only once<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, is_many_per_run=False, description="This is my value")<br />
<br />
# 3. Add data to database<br />
db.add_condition(1, "my_val", 1000)<br />
<br />
# Replace previous value<br />
db.add_condition(1, "my_val", 2000, replace=True)<br />
<br />
# 4. Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
<br />
</syntaxhighlight><br />
<br />
The script result:<br />
<pre><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
</pre><br />
<br />
<br />
More actions on objects:<br />
<br />
<syntaxhighlight lang="python"><br />
# 5. Get all existing conditions names and their descriptions<br />
for ct in db.get_condition_types():<br />
print ct.name, ':', ct.description<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val : This is my value<br />
</pre><br />
<br />
<br />
<syntaxhighlight lang="python"><br />
# 6. Get all values for the run 1<br />
run = db.get_run(1)<br />
print "Conditions for run {}".format(run.number)<br />
for condition in run.conditions:<br />
print condition.name, '=', condition.value<br />
</syntaxhighlight><br />
<br />
<br />
The script result:<br />
<pre><br />
my_val = 2000<br />
</pre><br />
<br />
<br />
<br />
The example also available as:<br />
<br />
<syntaxhighlight lang="bash"><br />
$RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
<br />
<br />
It is assumed that 'example.db' is SQLite database, created by *create_empty_sqlite.py* script. To run it:<br />
<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_basic.py<br />
</syntaxhighlight><br />
'''(!)''' note that to run the script again you probably have to delete the database <code>rm example.db</code><br />
<br />
The next sections will cover this example and give thorough explanation on what is here.<br />
<br />
<br />
<br />
== Connection ==<br />
<br />
<syntaxhighlight lang="python"><br />
db = RCDBProvider("sqlite:///example.db")<br />
</syntaxhighlight><br />
<br />
RCDBProvider is an object that holds database session and provides connect/disconnect functions. It uses connection<br />
strings to pass database parameters to the class. It also also carry functions to manage run condition and other<br />
RCDB data.<br />
<br />
<br />
The functions usually return database model objects (described right in the [[#Data model|next section]]).<br />
Additional manipulations over this objects could be done with SQLAlchemy (described later).<br />
<br />
<br />
For now we consider to use MySQL and SQLite databases. The connection strings for them are:<br />
<br />
'''MySQL'''<br />
<pre><br />
mysql://user_name:password@host:port/database<br />
</pre><br />
<br />
<br />
'''SQLite'''<br />
<pre><br />
sqlite:///path_to_file<br />
</pre><br />
'''(!)''' Note that because SQLite doesn't have user_name and password, it starts with three slashes ///.<br />
And thus there are four slashes //// in absolute path to file.<br />
<pre><br />
sqlite:////home/user/example.db<br />
</pre><br />
<br />
<br />
More about connections could be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls SQLAlchemy documentation]]<br />
<br />
<br />
In the example above class constructor is used to connect to database. But there are more connection functions:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create provider without connecting<br />
db = RCDBProvider()<br />
<br />
# Connect to database<br />
db.connect("sqlite:///example.db")<br />
<br />
# check connection and get connection string from provider<br />
if db.is_connected:<br />
print "connected to:", db.connection_string<br />
<br />
#disconnect from DB<br />
db.disconnect()<br />
</syntaxhighlight><br />
<br />
'''(!)''' Note that connect function doesn't really connect to database. It just creates so called ''engine'' and ''session''<br />
objects using the connection string. Thus, ''connect'' function raises exceptions if the connection string has wrong format<br />
or there is no required libraries in the system. But if there is no physical connection to MySQL or there is no such<br />
SQLite file, <ins>the function doesn't raise eny errors</ins>. The errors are raised on first data retrieval in such case.<br />
<br />
<br />
<br />
== Data model ==<br />
<br />
=== Database structure ===<br />
<br />
At the database level conditions part presented as 3 tables:<br />
<br />
<br />
RUNS CONDITIONS CONDITION_TYPES<br />
number <-- run_num name<br />
type_id --> field_type<br />
*_value is_many_per_run<br />
time<br />
<br />
<br />
So when we talk about name-value pair for the run, this actually means that:<br />
<br />
* Run number and other run information (like times of start and end) is stored in the runs table.<br />
* Names and type of value are stored in the condition_types table.<br />
* And, finally, values are stored in the conditions table, each record of it is referenced to a run and to a condition_type.<br />
<br />
<br />
=== Python class structure ===<br />
<br />
Python API data model classes resembles this structure. There are 3 python classes that you work with:<br />
<br />
* '''Run''' - represents run<br />
* '''Condition''' - stores data for the run<br />
* '''ConditionType''' - stores condition name, field type and other<br />
<br />
<br />
All classes have properties to reference each other. The main properties for conditions management are:<br />
<br />
<syntaxhighlight lang="python"><br />
class Run(ModelBase):<br />
number # int - The run number<br />
start_time # datetime - Run start time<br />
end_time # datetime - Run end time<br />
conditions # list[Condition] - Conditions associated with the run<br />
<br />
<br />
class ConditionType(ModelBase):<br />
name # str(max 255) - A name of condition<br />
value_type # str(max 255) - Type name. One of XXX_FIELD (see below)<br />
is_many_per_run # bool- True if the value is allowed many times per run<br />
values # query[Condition] - query to look condition values for runs<br />
<br />
# Constants, used for declaration of value_type<br />
STRING_FIELD = "string"<br />
INT_FIELD = "int"<br />
BOOL_FIELD = "bool"<br />
FLOAT_FIELD = "float"<br />
JSON_FIELD = "json"<br />
BLOB_FIELD = "blob"<br />
TIME_FIELD = "time"<br />
<br />
<br />
class Condition(ModelBase):<br />
time # datetime - time related to condition (when it occurred in example)<br />
run_number # int - the run number<br />
<br />
@property<br />
value # int, float, bool or string - depending on type. The condition value<br />
<br />
text_value # holds data if type STRING_FIELD,JSON_FIELD or BLOB_FIELD<br />
int_value # holds data if type INT_FIELD<br />
float_value # holds data if type FLOAT_FIELD<br />
bool_value # holds data if type BOOL_FIELD<br />
<br />
run # Run - Run object associated with the run_number<br />
type # ConditionType - link to associated condition type<br />
name # str - link to type.name. See ConditionType.name<br />
value_type # str - link to type.value_type. See ConditionType.value_type<br />
</syntaxhighlight><br />
<br />
<br />
=== How data is stored in the DB ===<br />
<br />
As you may noticed from comments above, in reality data is stored in one of the fields:<br />
<br />
{| class="wikitable"<br />
!Storage field<br />
!Value type<br />
|-<br />
|text_value<br />
|STRING_FIELD, JSON_FIELD or BLOB_FIELD<br />
|-<br />
|int_value<br />
|INT_FIELD<br />
|-<br />
|float_value<br />
|FLOAT_FIELD<br />
|-<br />
|bool_value<br />
|BOOL_FIELD<br />
|}<br />
<br />
When you call ''Condition.value'' property, Condition class checks for ''type.value_type'' and returns<br />
an appropriate ''xxx_value''.<br />
<br />
<br />
'''Why is it so?''' - because we would like to have queries like: ''"give me runs where event_count > 100 000"''<br />
<br />
i.e., if we know that ''event_count'' is int, we would like database to operate it as int.<br />
<br />
At the same time we would like to store strings and more general data with blobs. To have it, RCDB uses so called<br />
''"hybrid approach to object-attribute-value model"''. If value is int, float, bool or time, it is stored in appropriate field,<br />
which allows to use its type when querying. Finally it is possible search over ints, floats and time and, at the same time,<br />
to store more complex objects as JSON or blobs... to figure out them lately<br />
<br />
<br />
<br />
== Creating condition types ==<br />
<br />
To save data in run conditions, a "condition type" should be created first. It is done once in a database lifetime.<br />
Lets look ''create_condition_type'' from the example above (we add parameter names here):<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type(name="my_val",<br />
value_type=ConditionType.INT_FIELD,<br />
is_many_per_run=False,<br />
description="This is my value")<br />
</syntaxhighlight><br />
<br />
<br />
'''name''' - The first parameter is condition name. When we say "event_count for run 100", "event_count" is that name.<br />
Names are case sensitive. The API doesn't validate names for any name convension and there is no built in checking for<br />
spaces. But spaces would definitely make problems so are not recommended.<br />
<br />
It is possible to have names like:<br />
<br />
<syntaxhighlight lang="python"><br />
category/sub/name<br />
category-sub-name<br />
category-sub_name<br />
</syntaxhighlight><br />
<br />
Names are just strings. RCDB doesn't provide special treatment of slashes '/' or directories.<br />
<br />
<br />
'''value_type''' - The second parameter defines type of the value. It can be one of:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
* ConditionType.TIME_FIELD<br />
* ConditionType.JSON_FIELD<br />
* ConditionType.BLOB_FIELD<br />
<br />
More examples of how to use types are presented in the next section<br />
<br />
<br />
'''is_many_per_run''' - Allows to store many values with different time for the same run<br />
<br />
* '''False''' - API works as '''name''' - '''value'''(time), i.e. it checks that there is only one value per run<br />
<br />
* '''True''' - API allows '''name''' - '''[(value1, time1), (value2, time2), ...]''' scheme. <br />
<br />
<br />
''Explanation'' - There are two different behaviours that are assumed for run conditions: Sometimes it is intended to <br />
have strictly one name-value for a run. "''total_events''" or "''target_material''" are the examples. If <br />
''is_many_per_run=False'', then API checks that there is '''only one''' value per run. But the sometimes it is <br />
desirable to track value change during a run. Hall "''temperature''" or "''current''" are those examples. <br />
If ''is_many_per_run=True'', then API allows to set several values for different times under the same name for the same run<br />
<br />
More examples on it is given in [[#Replacing previous values]]<br />
<br />
<br />
'''description''' - 255 chars max human readable description, that other users can see. It is optional but it is very<br />
good practice to fill it.<br />
<br />
<br />
<br />
<br />
== Adding data to database ==<br />
<br />
<br />
=== Basic types: int, float, bool, string ===<br />
<br />
To store basic types one of the fields should be used:<br />
<br />
* ConditionType.STRING_FIELD<br />
* ConditionType.INT_FIELD<br />
* ConditionType.BOOL_FIELD<br />
* ConditionType.FLOAT_FIELD<br />
<br />
<br />
Lets example it:<br />
<br />
<syntaxhighlight lang="python"><br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Crete condition types<br />
db.create_condition_type("int_val", ConditionType.INT_FIELD, False)<br />
db.create_condition_type("float_val", ConditionType.FLOAT_FIELD, False)<br />
db.create_condition_type("bool_val", ConditionType.BOOL_FIELD, False)<br />
db.create_condition_type("string_val", ConditionType.STRING_FIELD, False)<br />
<br />
# Add values to run 1<br />
db.add_condition(1, "int_val", 1000)<br />
db.add_condition(1, "float_val", 2.5)<br />
db.add_condition(1, "bool_val", True)<br />
db.add_condition(1, "string_val", "test test")<br />
<br />
# Read values for run 1 and use them<br />
<br />
condition = db.get_condition(1, "int_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "float_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "bool_val")<br />
print condition.value<br />
<br />
condition = db.get_condition(1, "string_val")<br />
print condition.value<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
1000<br />
2.5<br />
True<br />
test test<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Time information ===<br />
<br />
A time information can be attached to any condition value. Standard python datetime is used for that: (Lets see the first example):<br />
<br />
<syntaxhighlight lang="python"><br />
# Create condition type<br />
db.create_condition_type("my_val", ConditionType.INT_FIELD, False)<br />
<br />
# Add value and time information<br />
db.add_condition(1, "my_val", 2000, datetime(2015, 10, 10, 15, 28, 12, 111111))<br />
<br />
# Get condition from database<br />
condition = db.get_condition(1, "my_val")<br />
<br />
print condition<br />
print "value =", condition.value<br />
print "name =", condition.name<br />
print "time =", condition.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
<Condition id='1', run_number='1', value=2000><br />
value = 2000<br />
name = my_val<br />
time = 2015-10-10 15:28:12.111111<br />
</syntaxhighlight><br />
<br />
<br />
If time is the only relevant information for a condition, then ConditionType.TIME_FIELD type can be used to create<br />
the condition type. In this case ''Condition.value'' field will have time information and time can be passed as<br />
value parameter of add_condition function:<br />
<br />
<syntaxhighlight lang="python"><br />
db.create_condition_type("lunch_bell_rang", ConditionType.TIME_FIELD, False)<br />
<br />
# add value to run 1<br />
time = datetime(2015, 9, 1, 14, 21, 01)<br />
db.add_condition(1, "lunch_bell_rang", time)<br />
<br />
# get from DB<br />
val = self.db.get_condition(1, "lunch_bell_rang")<br />
print val.value<br />
print val.time<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<syntaxhighlight lang="python"><br />
2015-09-01 14:21:01<br />
2015-09-01 14:21:01<br />
</syntaxhighlight><br />
<br />
Note that ''val.value'' and ''val.time'' are the same in this example.<br />
<br />
<br />
<br />
=== Multiple values per run ===<br />
<br />
To add many values of the same type, ''is_many_per_run'' parameter of ''create_condition_type'' function should be set<br />
to True. Then you are able to add many condition values per one run, but specifying time for each of them.<br />
<br />
<br />
'''(!)''' if '''is_many_per_run=True''', then '''get_condition''' returns a list of Condition objects. <inc>Even</inc><br />
if there is only one object selected.<br />
<br />
Example<br />
<br />
<syntaxhighlight lang="python"><br />
# Many condition values allowed for the run (is_many_per_run=True)<br />
# 1. If run has this condition, with the same value and actual_time the func. DOES NOTHING<br />
# 2. If run has this conditions but at different time, it adds this condition to DB<br />
<br />
db.create_condition_type("multi", ConditionType.INT_FIELD, True)<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
<br />
# First addition to DB. Time is None<br />
db.add_condition(1, "multi", 2222)<br />
<br />
# Ok. Value for time1 is added to DB<br />
db.add_condition(1, "multi", 3333, time1)<br />
db.add_condition(1, "multi", 4444, time2)<br />
<br />
results = db.get_condition(1, "multi")<br />
<br />
# We should get 3 values as:<br />
# 0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2<br />
# lets check it<br />
print results<br />
values = [result.value for result in results]<br />
times = [result.time for result in results]<br />
print values<br />
print times<br />
</syntaxhighlight><br />
<br />
The output:<br />
<br />
<syntaxhighlight lang="python"><br />
[<Condition id='1', run_number='1', value=2222>, <Condition id='2', run_number='1', value=3333>, <Condition id='3', run_number='1', value=4444>]<br />
[2222, 3333, 4444]<br />
[None, datetime(2015, 9, 1, 14, 21, 1, 222), datetime(2015, 9, 1, 14, 21, 1, 333)]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== Arrays and dictionaries ===<br />
<br />
Multiple values per run are '''NOT''' intended to store arrays of data.<br />
<br />
<br />
Best way to store arrays and dictionaries is serializing them to JSON. Use ConditionType.JSON_FIELD for that.<br />
RCDB conditions API doesn't provide mechanisms of converting objects to JSON and from JSON.<br />
For arrays it is done easily by json module.<br />
<br />
<br />
The example from [[https://docs.python.org/2/library/json.html python 2.7 documentation]]:<br />
<br />
<syntaxhighlight lang="python"><br />
>>> import json<br />
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])<br />
'["foo", {"bar": ["baz", null, 1.0, 2]}]'<br />
<br />
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')<br />
[u'foo', {u'bar': [u'baz', None, 1.0, 2]}]<br />
</syntaxhighlight><br />
<br />
So, serialization is on your side. It is done to have a better control over serialization.<br />
This means that '''if condition type is JSON_FIELD, ''add_condition'' function awaits string''' and '''after you<br />
get condition back, Condition.value contains string'''.<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
import json<br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("list_data", ConditionType.JSON_FIELD, False)<br />
db.create_condition_type("dict_data", ConditionType.JSON_FIELD, False)<br />
<br />
list_to_store = [1, 2, 3]<br />
dict_to_store = {"x": 1, "y": 2, "z": 3}<br />
<br />
# Dump values to JSON and save it to DB to run 1<br />
db.add_condition(1, "list_data", json.dumps(list_to_store))<br />
db.add_condition(1, "dict_data", json.dumps(dict_to_store))<br />
<br />
# Get condition from database<br />
restored_list = json.loads(db.get_condition(1, "list_data").value)<br />
restored_dict = json.loads(db.get_condition(1, "dict_data").value)<br />
<br />
print restored_list<br />
print restored_dict<br />
<br />
print restored_dict["x"]<br />
print restored_dict["y"]<br />
print restored_dict["z"]<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[1, 2, 3]<br />
{u'y': 2, u'x': 1, u'z': 3}<br />
1<br />
2<br />
3<br />
</pre><br />
<br />
<br />
The example is located at<br />
<br />
<syntaxhighlight lang="python"><br />
$RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
and can be run as:<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/create_empty_sqlite.py example.db<br />
python $RCDB_HOME/python/example_conditions_store_array.py<br />
</syntaxhighlight><br />
<br />
As one can mention unicode string is returned as unicode after json deserialization (look at u"x" instead of just "x").<br />
It is not a problem if you just work with this array, because python acts seamlessly with unicode strings.<br />
As you can see in example, we use usual string "x" in restored_dict["x"] and it just works.<br />
<br />
If it is a problem, there is a<br />
[[http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python stackoverlow question on that]]<br />
<br />
Using pyYAML to deserialize to strings looks easy.<br />
<br />
<br />
<br />
=== Custom python objects ===<br />
<br />
To save custom python objects to database, jsonpickle package could be used. It is an open source project available<br />
via pip install. It is not shipped with RCDB at the moment.<br />
<br />
<syntaxhighlight lang="python"><br />
from rcdb.provider import RCDBProvider<br />
from rcdb.model import ConditionType<br />
import jsonpickle<br />
<br />
<br />
class Cat(object):<br />
def __init__(self, name):<br />
self.name = name<br />
self.mice_eaten = 1230<br />
<br />
<br />
# Create RCDBProvider provider object and connect it to DB<br />
db = RCDBProvider("sqlite:///example.db")<br />
<br />
# Create condition type<br />
db.create_condition_type("cat", ConditionType.JSON_FIELD, False)<br />
<br />
<br />
# Create a cat and store in in the DB for run 1<br />
cat = Cat('Alice')<br />
db.add_condition(1, "cat", jsonpickle.encode(cat))<br />
<br />
# Get condition from database for run 1<br />
condition = db.get_condition(1, "cat")<br />
loaded_cat = jsonpickle.decode(condition.value)<br />
<br />
print "How cat is stored in DB:"<br />
print condition.value<br />
print "Deserialized cat:"<br />
print "name:", loaded_cat.name<br />
print "mice_eaten:", loaded_cat.mice_eaten<br />
</syntaxhighlight><br />
<br />
The result:<br />
<br />
<syntaxhighlight lang="python"><br />
How cat is stored in DB:<br />
{"py/object": "__main__.Cat", "name": "Alice", "mice_eaten": 1230}<br />
Deserialized cat:<br />
name: Alice<br />
mice_eaten: 1230<br />
</syntaxhighlight><br />
<br />
<br />
[[http://jsonpickle.github.io jsonpickle Documentation]]<br />
<br />
jsonpickle installation:<br />
<br />
system level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install jsonpickle<br />
</syntaxhighlight><br />
<br />
user level:<br />
<br />
<syntaxhighlight lang="bash"><br />
pip install --user jsonpickle<br />
</syntaxhighlight><br />
<br />
<br />
<br />
=== STRING_FIELD vs. JSON_FIELD vs. BLOB_FIELD ===<br />
<br />
What if data doesn't fit into the string or JSON? There is ConditionType.BLOB_FIELD type.<br />
<br />
Concise instruction is much like JSON:<br />
<br />
* Set condition type as BLOB_FIELD<br />
* You serialize object whatever you like<br />
* Save it to DB as string<br />
* Load from DB<br />
* Deserialize whatever you like<br />
<br />
<br />
But what is the difference between STRING_FIELD, JSON_FIELD and BLOB_FIELD?<br />
<br />
<br />
There is no difference in terms of storing the data. A Condition class, same as a database table, has ''text_value''<br />
field where text/string data is stored. The ONLY difference is how this fields are treated and presented in GUI.<br />
<br />
* '''STRING_FIELD''' - is considered to be a human readable string.<br />
<br />
* '''JSON_FIELD''' - is considered to be JSON, which is colored and formatted accordingly<br />
<br />
* '''BLOB_FIELD''' - is considered to be neither very readable string nor JSON. But it is still should converted to some string. And I hope it will never be used.<br />
<br />
<br />
<br />
<br />
== Replacing previous values ==<br />
<br />
What if the condition value for this run with this name already exists in the DB?<br />
<br />
In general, to replace value ''replace=True'' parameter should be set in ''add_condition''.<br />
<br />
For single value per run: 1. If run has this condition, with the same value and time, exception is not raised and function does nothing. 2. If value OR actual_time is different than in DB, function checks 'replace' flag and behave accordingly to it<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
db.add_condition(1, "event_count", 1000) # First addition to DB<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. OverrideConditionValueError<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value<br />
print(db.get_condition(1, "event_count"))<br />
# value: 2222<br />
# time: None<br />
<br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "timed", 1, time1) # First addition to DB<br />
db.add_condition(1, "timed", 1, time1) # Ok. Do nothing<br />
db.add_condition(1, "timed", 1, time2) # Error. Time is different<br />
db.add_condition(1, "timed", 5, time1) # Error. Value is different<br />
db.add_condition(1, "timed", 5, time2, True) # Ok. Value replaced<br />
<br />
print(db.get_condition(1, "timed"))<br />
# value: 5<br />
# time: time2<br />
</syntaxhighlight><br />
<br />
<br />
If many condition values allowed for the run (is_many_per_run=True)<br />
<br />
# If run has this condition, with the same value and same time the func. DOES NOTHING<br />
# If run has this conditions but at different time, it adds this condition to DB<br />
# If run has this condition at this time<br />
<br />
<br />
Example:<br />
<br />
<syntaxhighlight lang="python"><br />
time1 = datetime(2015,9,1,14,21,01, 222)<br />
time2 = datetime(2015,9,1,14,21,01, 333)<br />
db.add_condition(1, "event_count", 1000) # First addition to DB. Time is None<br />
db.add_condition(1, "event_count", 1000) # Ok. Do nothing, such value already exists<br />
db.add_condition(1, "event_count", 2222) # Error. Another value for time None<br />
db.add_condition(1, "event_count", 2222, replace=True) # Ok. Replacing existing value for time None<br />
db.add_condition(1, "event_count", 3333, time1) # Ok. Value for time1 is added to DB<br />
db.add_condition(1, "event_count", 4444, time1) # Error. Value differs for time1<br />
db.add_condition(1, "event_count", 4444, time2) # Ok. Add 444 for time2 to DB<br />
<br />
print(db.get_condition(1, "event_count"))<br />
# [0: value=2222; time=None<br />
# 1: value=3333; time=time1<br />
# 2: value=4444; time=time2]<br />
</syntaxhighlight><br />
<br />
<br />
<br />
<br />
== SQLAlchemy ==<br />
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when<br />
objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to<br />
navigate between objects.<br />
<br />
Lets see a code example:<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# get Run object for the run number 1<br />
run = db.get_run(1)<br />
<br />
# now we have access to all conditions for that run as<br />
run.conditions<br />
<br />
# get all condition names or all condition values<br />
<br />
names = [condition.name for condition in run.conditions]<br />
values = [condition.values for condition in run.conditions]<br />
</syntaxhighlight><br />
<br />
SQLAlchemy makes queries to database if needed. So when you do <code>run = self.db.get_run(1)</code>, ''Run.conditions''<br />
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time<br />
when a real value is needed, database is queried for all conditions for that run.<br />
<br />
<br />
<br />
== Editing or deleting objects ==<br />
<br />
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types<br />
considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.<br />
<br />
<br />
To edit or delete things SQLAlchemy '''session''' object can be used.<br />
<br />
<br />
=== Editing ===<br />
<br />
'''Edit condition type'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.value_type = ConditionType.JSON_FIELD<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
<br />
'''Rename condition'''<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# Change what you need<br />
condition_type.name = "new_var"<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
The magic is that all data for all runs are now accessible by '''new_var'''<br />
<br />
<br />
=== Deleting ===<br />
<br />
Deleting objects is done with session.delete function:<br />
<br />
<syntaxhighlight lang="python"><br />
# Edit condition type<br />
condition_type = db.get_condition_type("my_var")<br />
<br />
# mark the object for deletion<br />
db.session.delete(condition_type)<br />
<br />
# Calling session commit will save changes to database<br />
db.session.commit()<br />
</syntaxhighlight><br />
<br />
More about session and SQLAlchemy objects manipulation with it can be found in<br />
[[http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation]]<br />
<br />
<br />
<br />
<br />
<br />
== Database querying ==<br />
<br />
<br />
=== Working with runs ===<br />
If you ever want to get Run object by run_number here is how:<br />
<br />
<syntaxhighlight lang="python"><br />
run = db.get_run(run_number)<br />
print run.number<br />
print run.start_time<br />
print run.end_time<br />
print run.conditions... # but it is written further<br />
</syntaxhighlight><br />
<br />
How to query runs is shown far below<br />
<br />
<br />
=== Get runs by number (or intruduction to SQLAlchemy queries) ===<br />
<br />
Lets select all runs with run_number < 100 using SQLAlchemy<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).filter(Run.number < 100)<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
What happened?<br />
<br />
'''db.session''' - gets SQLAlchemy ''session'' object<br />
<br />
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query<br />
<br />
'''.filter(Run.number < 100)''' - filtering clause<br />
<br />
When we've got query ready, we can actually get objects by <code>query.first()</code> or <code>query.all()</code><br />
(there are actually more) or just count number of runs by <code>query.count()</code><br />
<br />
We can use Run.conditions to get conditions for each run. Lets see more advanced example<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run)<br />
.filter(Run.number.between(50,55)<br />
.order_by(desc(Run.number))<br />
<br />
# get all such runs<br />
runs = query.all()<br />
for run in runs:<br />
event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')<br />
</syntaxhighlight><br />
<br />
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its<br />
conditions. If might be OK for many cases.<br />
<br />
<br />
<br />
=== Raw SQLAlchemy queries ===<br />
<br />
What if we want to select runs by conditions value?<br />
<br />
<br />
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full<br />
power of SQLAlchemy queries.<br />
<br />
<br />
Lets say, we want to get all runs with '''event_count''' > '''100 000'''<br />
<br />
<syntaxhighlight lang="python"><br />
# open database<br />
db = rcdb.RCDBProvider("sqlite:///example.db")<br />
<br />
# create query<br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(ConditionType.name == "event_count")\<br />
.filter(Condition.int_value > 100 000)\<br />
.order_by(Run.number)<br />
<br />
<br />
# get count of selected runs<br />
print query.count()<br />
<br />
# get first run from selected<br />
print query.first()<br />
<br />
# get all run that matches the creteria<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened here.<br />
<br />
By first line:<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
</syntaxhighlight><br />
<br />
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions<br />
and condition types ('''.join(Run.conditions).join(Condition.type)''').<br />
<br />
<br />
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')<br />
<br />
<br />
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.<br />
<br />
<br />
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.<br />
<br />
<br />
But probably you already feel drawbacks of this approach:<br />
<br />
* First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.<br />
* Another drawback is that when you add more logic, the query becomes bulky.<br />
<br />
<br />
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4<br />
<br />
<syntaxhighlight lang="python"><br />
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\<br />
.filter(Run.number.between(1000, 2000)\<br />
.filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) |<br />
((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\<br />
.order_by(Run.number)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used.<br />
SQLAlchemy overloads this operators to use for comparison.<br />
<br />
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions<br />
instead, but it doesn't improve the readability.<br />
<br />
<br />
<br />
=== Querying using RCDB helpers ===<br />
<br />
RCDB ConditionType provide helpful properties to make querying easier.<br />
<br />
<syntaxhighlight lang="python"><br />
# get condition type<br />
t = db.get_condition_type("event_count")<br />
<br />
# select runs where event_count > 1000<br />
query = t.run_query.filter(t.value_field > 1000)<br />
<br />
print query.all()<br />
</syntaxhighlight><br />
<br />
<br />
What happened?<br />
<br />
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:<br />
<br />
<syntaxhighlight lang="python"><br />
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")<br />
</syntaxhighlight><br />
<br />
<br />
*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare<br />
<br />
<br />
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or<br />
'''intersect''' methods later.<br />
<br />
<br />
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py<br />
<br />
<syntaxhighlight lang="python"><br />
# create in memory SQLite database<br />
db = rcdb.RCDBProvider("sqlite://")<br />
rcdb.model.Base.metadata.create_all(db.engine)<br />
<br />
# create conditions types<br />
event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD, False)<br />
data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD, False)<br />
<br />
# create runs and fill values<br />
for i in range(0, 100):<br />
db.create_run(i)<br />
db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049<br />
db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2<br />
<br />
<br />
""" Demonstrates ConditionType query helpers"""<br />
event_count_type = db.get_condition_type("event_count")<br />
data_value_type = db.get_condition_type("data_value")<br />
<br />
# select runs where event_count > 1000<br />
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53)<br />
print query.all()<br />
<br />
# select runs where 1.52 < data_value < 1.7<br />
query2 = data_value_type.run_query<br />
.filter(data_value_type.value_field.between(1.52, 1.7))\<br />
.filter(Run.number < 55)<br />
print query2.all()<br />
<br />
# combine results of this two queries<br />
print "Results intersect:"<br />
print query.intersect(query2).all()<br />
print "Results union:"<br />
print query.union(query2).all()<br />
</syntaxhighlight><br />
<br />
The output is:<br />
<br />
<pre><br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>]<br />
[<Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
<br />
Results intersect:<br />
[<Run number='52'>, <Run number='53'>]<br />
<br />
Results union:<br />
[<Run number='51'>, <Run number='52'>, <Run number='53'>, <Run number='54'>]<br />
</pre><br />
<br />
<br />
More on SQLAlchemy queries in<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial]<br />
[http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]<br />
<br />
<br />
The example is available as<br />
<syntaxhighlight lang="bash"><br />
python $RCDB_HOME/python/example_conditions_query.py<br />
</syntaxhighlight><br />
(It creates inmemory database so there is no need in creaty_empty_sqlite.py)<br />
<br />
<br />
<br />
<br />
== Logging ==<br />
<br />
RCDB have a logging system which stores some information about what is going on in the same database in *'log_records'*<br />
table.<br />
<br />
<br />
Set '''RCDB_USER''' environment variable to have your name in logs (or set it manually in API as shown below)<br />
<br />
<br />
* Creating condition types goes to log automatically<br />
* All condition values manipulations are not logged<br />
<br />
It is done in assumption, that the database has many runs and each run has many condition values,<br />
so if each condition value creation will have text log message, the database will be bloated with log records.<br />
<br />
<br />
From the other point of view, when you do a series of operations with conditions it may be a good idea to left a<br />
log message that could be seen by other users.<br />
<br />
<br />
Custom data modification by SQLAlchemy, like creating or deleting objects manually with session.commit() is not<br />
logged too, so log notification is left to user here too.<br />
<br />
<br />
How to left a log record:<br />
<br />
<syntaxhighlight lang="python"><br />
# set RCDB_USER environment variable to give RCDB you user name<br />
# another option is to give it in constructor<br />
db = RCDBProvider("sqlite:///example.db", user_name="john")<br />
<br />
# and one more option of setting user name<br />
db.user_name = "john"<br />
<br />
# simplest log version<br />
db.add_log_record(None, "Hello everybody! You'll see this message in logs on RCDB site", 0)<br />
</syntaxhighlight><br />
<br />
First None means there is no specific database object ID for this message. The last '0' means there is no specific run number for this message<br />
<br />
<br />
<br />
<br />
== Support ==<br />
Dmitry Romanov <[mailto:romanov@jlab.org romanov@jlab.org]><br />
<br />
DescriptionDescription of how to manage RCDB run conditions using python API</div>Romanov