HOWTO Copy data offsite using Globus

From GlueXWiki
Jump to: navigation, search

Here are some instructions for setting up a personal Globus endpoint on your desktop/laptop/server so you can transfer files efficiently to/from JLab. Note that for this to work, you do need a Globus account. You can create an account for free (a paid account is not required).

NOTE: While there is an option to login using things like your Google account, you should use "CILogin" since you will not be able to access the JLab endpoint(s) otherwise.


Setting up the GUI

Instructions for downloading and installing the Globus Connect Personal endpoint can be found here:

Globus Connect Personal

This is the quickest way to start transferring files to you local machine. You'll just need to authenticate the jlab#scifiles endpoint.

If you need to automate the process in some way, you can use the CLI(command line interface) as shown below.


Setting up the CLI

See full documentation at Globus CLI page.

These instructions are for setting up the command line interface(CLI) for transferring files between your local desktop/laptop/server and the JLab Data Transfer Node (DTN). This can be done from Linux, Mac, or even from within a Docker container. Note that the CLI is installed as a python package so can be installed either at the system level or in a virtual environment. These instructions use a virtual environment so they can be used from an unprivileged account.

Using a Docker container

   > docker run -it --rm  --platform linux/amd64 centos:7.9.2009 bash
   # localedef -i en_US -f UTF-8 en_US.UTF-8
   # echo 'export LC_ALL=en_US.UTF-8' >> /etc/bashrc
   # yum install -y wget python3
   # adduser globusme
   # su globusme
   $ cd
   $ python3 -m venv venv
   $ source venv/bin/activate
   (venv) $ pip install --upgrade pip
   (venv) $ pip install globus-cli
   (venv) $ globus login --no-local-server
   Please authenticate with Globus here:
   ------------------------------------
   https://auth.globus.org/v2/oauth2/authorize?client_id=....
   ------------------------------------

The above globus login command will give you a URL to cut and paste into your browser. This can be any browser on *any* machine where you can log into your globus account. (i.e. you do not need to do it from within the docker container if you are setting up the CLI there). Once you are at the authorization page, allow the access and it will provide an authorization code to copy and paste back into the command line interface. Do this and hit return.

   Enter the resulting Authorization Code here: <paste code from the browser window>

At this point you should be logged in. You now need to set up an instance of Globus Personal Connect to create an endpoint on the local node. For Linux, download the software with:

   (venv) $ wget https://downloads.globus.org/globus-connect-personal/v3/linux/stable/globusconnectpersonal-latest.tgz
   (venv) $ tar xzf globusconnectpersonal-latest.tgz
   (venv) $ cd globusconnectpersonal-3.1.6
   (venv) $ ./globusconnect
   Detected that setup has not run yet, and '-setup' was not used
   Will now attempt to run
     globusconnectpersonal -setup
   Globus Connect Personal needs you to log in to continue the setup process.
   
   We will display a login URL. Copy it into any browser and log in to get a
   single-use code. Return to this command with the code to continue setup.
   
   Login here:
   -----
   https://auth.globus.org/v2/oauth2/authorize?client_id=...
   -----
   Enter the auth code: <paste code from the browser window>

Similarly to above, cut and paste the URL into a browser for authentication and copy the resulting code back to the command line. It will then prompt you for a name for you local endpoint. Give it a recognizable name and hit enter. It will then print several messages that include an endpoint ID. It should look similar to this:

   Input a value for the Endpoint Name: MyDesktopDocker
   registered new endpoint, id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   setup completed successfully
   
   Will now start globusconnectpersonal in GUI mode
   Graphical environment not detected
   
   To launch Globus Connect Personal in CLI mode, use
     globusconnectpersonal -start
   
   Or, if you want to force the use of the GUI, use
     globusconnectpersonal -gui

To make it easy, set an environment variable to use for the endpoint id. Then, start up the personal globus endpoint.

   (venv) $ export EP_LOCAL=`globus endpoint local-id`
   (venv) $ ./globusconnectpersonal -start

At this point, if you go back to your web browser and log into globus you should be able to find your endpoint by typing its name into the "Collection" field of the file browser.

Next, you'll need to get the JLab endpoint id to transfer files from. This is most easily done from the CLI with:

   (venv) $ globus endpoint search jlab#scifiles
   ID                                   | Owner             | Display Name 
   ------------------------------------ | ----------------- | -------------
   e2f4595b-6d04-11e5-ba46-22000b92c6ec | jlab@globusid.org | jlab#scifiles

Put this into an environment variable:

   (venv) $ export EP_JLAB=e2f4595b-6d04-11e5-ba46-22000b92c6ec

Now, you should be able to browse the JLab files like this:

   (venv) $ globus ls $EP_JLAB:/expphy
   cache/
   volatile/
   volatile-old/
   work/

You can schedule the transfer of a file from JLab to the local machine with something like:

   (venv) $ globus transfer $EP_JLAB:/expphy/cache/halld/RunPeriod-2021-11/rawdata/Run090183/hd_rawdata_090183_001.evio $EP_LOCAL:hd_rawdata_090183_001.evio
   Message: The transfer has been accepted and a task has been created and queued for execution
   Task ID: 48500dd6-493d-11ec-a515-b537d6c07c1d

You can watch the transfer progress via the globus webiste, but sometimes that gives false warnings/errors. It is probably best to watch the destination directory directly. Transfers should generally start up within several seconds.

If you need to transfer more than one file in a single task, look at the instructions here:

Globus CLI Batch Transfers