![instinct-vfx] @instinct-vfx: I have a bit of a bigger topic. No idea if anyone faces something like that at all here: How to handle remote impersonation and/or credentials in a render-farm scenario?
Currently we use Qube as a render manager and Qube has a feature that allows us to let render jobs be executed as the user that submitted them. That is not directly possible on windows, as - as opposed to linux - you can not run a process under a different user without their credentials. Qube (and Deadline) get around this by letting users register their Windows password encrypted in their DB and then using that information to launch processes under that user. We need this because we are not allowed to have a generic render user that has access to all projects as we must control process access per user individually.
Questions i have:
• Anyone knows if any other render manager supports this besides Qube and Deadline?
• In addition to just share access we have an increasing amount of central services that also need authentication. For some we were able to put Kerberos forwarding in place and piggy back the above mentioned mechanism. But that’s cumbersome and does not always work and Kerberos is also to be deprecated. Anyone knows of solutions that allow me to hand credentials to a render job process in a way that prevents compromising of the credentials? (e.g. if i add that information to a job, then everyone with access to the queue can read the credentials and potentially abuse them.
![Andre_Anjos] @Andre_Anjos: @instinct-vfx Just for my understanding… In what circumstances would the impersonator be able to access to the render farm? Are you talking about someone internally, if they are aware of the credentails for a specific user?
![instinct-vfx] @instinct-vfx: To clarify: A user submits a job to the farm, the farm needs to be able to run this job on a different computer (=render node) but as the user that submitted it. So the farm uses impersonation to run as a user. The impersonator is the render farm basically.
![minkiu] @minkiu: Hmm that sounds like quite the pickle, maybe have a pre-job/task that adds the credential as an ENV VAR , so it sets it on runtime, and the tool can then use the credential, but it’s not in the job properties environment maybe…
![instinct-vfx] @instinct-vfx: The main problem is preventing people with access to the render manager from extracting information
![minkiu] @minkiu: > access to the render manager
Are these unfetered access?
Even so with the idea I mention they would still be unable to to get the credential, no?
1) Job Preload: queries vault or whatever to get credentials an set it as env var to the current process 2) your tool has now access to the env var so it can use it
So unless the user with farm access can inspect processes they shouldn’t be able to see it, no?
![instinct-vfx] @instinct-vfx: Looked into all kinds of setups, like Hashicorp Vault and other credential stores. But in a standard farm setup that does not really work. It just increases your attack surface
![Andrew_Golubev] @Andrew_Golubev: there is no ability to set up a specific role?
where y can setup specific role, and lock only sending but not retrieve data
![instinct-vfx] @instinct-vfx: But how do you get access to vault? That’s typically a token. But if that token is stored in the job, then everyone with access will see the token, and can in turn use the token to pull the actual credentials
I don’t know of any render managers that allow to store data in jobs that is not visible. Unless you build your own encryption and simply submit encrypted data. But then you also need a means to decrypt that
![minkiu] @minkiu: ask for passwords on submission?
and encrypt it like you say
![bob.w] @bob.w: Every user gets their own virtual farm?
![minkiu] @minkiu: specific farm per job, sorted
![instinct-vfx] @instinct-vfx: That’s what Qube and Deadline basically do. But that only works if encryption is a feature of the render manager
Bob is spot on. That’s how it works in the cloud. Every job gets its own farm potentially
And in a cloud scenario Vault works perfectly fine because it is easy to encapsulate all of this completely away from the user
But that falls apart if you need to add local machines, and even worse, have workstations render at night where people have physical (or at least full remote) access
![bob.w] @bob.w: Oh yeah, mixing and matching hardware seems gross
![dhruv] @dhruv: Also stuff like vault etc work well in containers. But containers aren’t feasible with multiple OS support or GPU access
![instinct-vfx] @instinct-vfx: Or Windows Support with GPU to make Unreal happy
![dhruv] @dhruv: IMHO I’m sort of moving away from the idea of spoofing the user. As long as I can make sure the data is readable by the user, it’s all good.
The big thing is bits that interact with a server that needs auth. I haven’t implemented it yet, but I think what I’d want to do is use temporary oauth tokens scoped to the duration of the job
That way there’s limited risk of exposed tokens.
The token store could be accessible only to the farm user and when the job fails or finishes, token is no longer valid
![instinct-vfx] @instinct-vfx: Well jobs can spend quite a while on the farm. It’s tricky to find the right timeout if you will. Also that still leaves a lot of time to access data. There is no single farm user and that would also cause quite a lot of effort elsewhere should we want to implement that. The biggest issue is that all render managers i know are simply not designed to do that. And it is not really possible to piggy back something on top of the existing system. At least i have not found any.
Without impersonation i would have to somehow manage storage access (that is manage AD ACLs on an Isilon in this case), or manage machine users per job, but then you are basically back to impersonation. No matter how i look at it, it makes me want to cry
![dhruv] @dhruv: Yeah currently every farm basically assumes trust from everyone, with some basic permission based scoping
With regards to the token being around for a long time, that’s why I think it should only be exposed to the farm user.
![instinct-vfx] @instinct-vfx: Most managers do not really support confidential information in jobs. And as there is no single user there is no real way to distinguish between the real user and the farm impersonated one.
Due to the crazy volatility and the amount of parallel projects and clients and the respective security requirements this is a huge issue for us.
The best would actually be if i could manage local resources like the cloud, with containerization and full windows + GPU support
That would solve a LOT of my problems in one go. But i have low hope
![Andrew_Golubev] @Andrew_Golubev: @instinct-vfx as you tell more, I starting to understand how much deeper this task is
![Allen_Rose] @Allen_Rose: @instinct-vfx, If you have any revelations (good or bad) please let us know. I, for one, will probably end up having to solve this issue later in the year.
![instinct-vfx] @instinct-vfx: Will definitely do. Our current approach is to piggy back everything on the user account. This is supported in Qube directly and also gives us a place for user-specific secrets (namely there user folder). But it’s bulky, the implementation is lacking and it does not generalize well for other types of services. Also currently evaluating different possibilities for the future so this is one of the major concerns i have to take into account
![Allen_Rose] @Allen_Rose: Currently we’re running a “v0” pipeline (just whatever band-aids were necessary so that people could work). We’re in the middle of building out a new services based backend, but need to do that in an air-gapped environment. Our kubernetes guru has pitched using it to replace Deadline (our current farm solution) but not sure how feasible that actually is.
We’ve hit similar issues with FTrack, as their API requires a user key. So you either need to use a generic one (and then everyone can do everything) or manage that separately in some way. To be fair, ftrack-connect helps with this is some cases.
![dhruv] @dhruv: Kubernetes can help with the management of the jobs. However it won’t give you all the other reporting infrastructure and task dependency management. You’d be reinventing a lot around it
And you’d be restricted on what OS you can run. E.g you’d basically be giving up windows and Mac support so best to know up front that you’re only going to need to run Linux stuff
![instinct-vfx] @instinct-vfx: What @dhruv said. I have also had discussions around that and there is a lot that is missing from K8s unless you actually manage dynamic resources in a hands-off way. Which a render farm is often not
![Allen_Rose] @Allen_Rose: We’re writing a custom process dependency graph service, as we want to be able to explicitly track/manage the flow of data/files. So I’m not too concerned about losing the native job dependencies that Deadline provides.
You’re right about the OSs. That been top of mind for us lately. We’re currently Windows based (as that’s what corporate understood) we’re planning on moving as much as we can to Linux soon, but acknowledge that we’ll likely always have some Windows holdouts.
I’m imagine a first/half-way step is migrating to submitting containers to Deadline.
You bring up a good point about how actively the farm is managed. Personally, I’d love for no one to ever touch anything after submission (but that’s mostly based on my lack of desire to do any management myself)
![instinct-vfx] @instinct-vfx: I share that sentiment. I don’t think it is realistic though unless you are in an elastic environment that can ramp resources up/down on its own.
And while technically K8S could do that if you actually wanted to do it you would still need to buy “peak load amount nodes”
And you would still see fighting over those resources
It works in the cloud because you do not need to worry about physical Infra, Racks, Machines, Network etc. You can ramp things up as needed and trash when done. It does not matter (that’s a bit oversimplification but still) if you render 100h on 1 machine or 1h on 100 machines.
Hence you do not really need to manage scheduling, distribution of compute to projects etc. Instead you need to manage cost, forecast and shutting things down effeciently
![Allen_Rose] @Allen_Rose: You’re absolutely right. If resources are limited, then you need some method/functionality to determine who gets what. Then, regardless, there will also be higher level conversations about how much compute is being used, what kind of turn-around time is ideal/acceptable… There’s a lot to unpack.
![instinct-vfx] @instinct-vfx: Yeah, and while on paper this looks like a “we will just assign a % of resources to each project and manage that”, reality tends to get a lot more complex. As in “but i need to re-render this NOW on all machines as i have a deadline”. And there is no real way to solve that without having a conversation between the corresponding project managers i would say