Access Controls¶
The access control features of Pachyderm Enterprise let you create and manage various users that interact with your Pachyderm cluster. You can restrict access to individual data repositories on a per user basis and, as a result, restrict the subscription of pipelines to those data repositories.
These docs will guide you through:
- Understanding Pachyderm access controls.
- Activating access control features (aka “auth” features).
- Logging into Pachyderm.
- Managing/updating user access to data repositories.
We will also discuss:
- The behavior of pipelines when using access control
- The behavior of a cluster when access control is de-activated or an enterprise token expires
Understanding Pachyderm access controls¶
Assuming access controls are activated, each data repository (aka repo) in Pachyderm will have an Access Control List (ACL) associated with it. The ACL will include:
- READERs - users who can read the data versioned in the repo.
- WRITERs - users with READER access who can also commit additions, deletions, or modifications of data into the repo.
- OWNERs - users with READER and WRITER access who can also modify the repo’s ACL.
Currently, Pachyderm accounts correspond to GitHub users, who authenticate inside of Pachyderm using OAuth integration with GitHub. Pachyderm user accounts are identified within Pachyderm via their GitHub usernames.
There is a single, hardcoded “admin” group (and no other groups) in Pachyderm. Users in that admin group have the ability to perform any action in the cluster, including appointing other admins. Further, a repo with no ACL can only be managed by the cluster admins.
Activating access control¶
First, you will need to make sure that your cluster has Pachyderm Enterprise Edition activated (you can follow this guide to activate Enterprise Edition). The status of the Enterprise features can be verified by accessing the Pachyderm dashboard or with pachctl
as follows:
$ pachctl enterprise get-state
ACTIVE
Next, we need to activate the Enterprise access control features. This can be done in the dashboard or with pachctl auth activate. However, before executing that command, we should decide on at least one user that will have admin privileges on the cluster. Pachyderm admins will be able to modify the scope of access for any non-admin users on the cluster. All users in Pachyderm are identified by their GitHub usernames.
Activating access controls with the dashboard¶
To activate access controls via the Pachyderm dashboard, go to the settings page where you should see a “Activate Access Controls” button. Click on that button. You will then be able to enter one or more Github users as cluster admins and activate access controls:
After activating access controls, you should see the following screen asking you to login to Pachyderm:
Activating access controls with pachctl¶
To activate access controls on a cluster and set the GitHub user dwhitena
as an admin, we would execute the following pachctl
command:
$ pachctl auth activate --admins=dwhitena
Your Pachyderm cluster can have more than one admin if you like, but you need to supply at least one with this command. To add multiple admins, You would just need to specify them here as a comma separated list.
Logging into Pachyderm¶
Now that we have activated access control, we can login to our cluster. When using the Pachyderm dashboard, you will need to login on the dashboard, and, when using the pachctl
CLI, you will need to login via the CLI.
Login on the dashboard¶
Once you have authorized access controls for Pachyderm, you will need to login to use the Pachyderm dashboard as shown above in this section. To login, click the “Get GitHub token” button. You will then be presented with an option to “Authorize Pachyderm” (assuming that you haven’t authorized Pachyderm on GitHub previously). Once you authorize Pachyderm, you will be presented with a Pachyderm user token:
Copy and paste this token back into the Pachyderm login screen and press enter. You are now logged in to Pachyderm, and you should see your Github avatar and an indication of your user in the upper left hand corner of the dashboard:
Login using pachctl
¶
You can use the pachctl auth login <username>
to login via the CLI. When we execute this command, pachctl
will provide us with a GitHub link to authenticate ourselves as the provided GitHub user, as shown below:
$ pachctl auth login dwhitena
(1) Please paste this link into a browser:
https://github.com/login/oauth/authorize?client_id=d3481e92b4f09ea74ff8&redirect_uri=https%3A%2F%2Fpachyderm.io%2Flogin-hook%2Fdisplay-token.html
(You will be directed to GitHub and asked to authorize Pachyderm's login app on Github. If you accept, you will be given a token to paste here, which will give you an externally verified account in this Pachyderm cluster)
(2) Please paste the token you receive from GitHub here:
When visiting this link in a browser, you will be presented with an option to “Authorize Pachyderm” (assuming that you haven’t authorized Pachyderm via GitHub previously). Once you authorize Pachyderm, you will be presented with a Pachyderm user token:
Copy and paste this token back into the terminal, as requested by pachctl
, and press enter. You are now logged in to Pachyderm!
Managing and updating user access¶
Let’s suppose that we create a repository call test
when we are logged into Pachyderm as the user dwhitena
. Because, the user dwhitena
created this repository, dwhitena
will have full read/write access to the repo. This can be confirmed on the dashboard by navigating to or clicking on the repo test
. The results repo details will show your current access to the repository:
You can also confirm your access via the pachctl auth get ...
command:
$ pachctl auth get dwhitena test`
OWNER
An OWNER of test
or a cluster admin can then set other user’s scope of access to the repo. This can be done via the pachctl auth set ...
command or via the dashboard. For example, to give the GitHub users JoeyZwicker
and msteffen
READER (but not WRITER or OWNER) access to test
and jdoliner
WRITER (but not OWNER) access, we can click on Modify access controls
under the repo details in the dashboard. This will allow us to easily add the users one by one:
Activation code expiration and de-activation¶
When an enterprise activation code expires, an auth-activated Pachyderm cluster goes into an “admin only” state. In this state, only admins will have access to data that is in Pachyderm. This safety measure keeps sensitive data protected, even when an enterprise subscription becomes stale. As soon as the enterprise activation code is updated (via the dashboard or via pachctl enterprise activate ...
), the Pachyderm cluster will return to it’s previous state.
When access controls are de-activated on a Pachyderm cluster via pachctl auth deactivate
, the cluster returns to being a non-access controlled Pachyderm cluster. That is,
- All ACLs are deleted.
- The cluster returns to being a blank slate in regards to access control. Everyone that can connect to Pachyderm will be able to access and modify the data in all repos.
- There will no longer be a concept of users (i.e., no one will be able to login to Pachyderm).