Rex-Rancher

 view release on metacpan or  search on metacpan

lib/Rex/Rancher.pm  view on Meta::CPAN

      domain          => 'k8s.example.com',
      token           => 'my-secret',
      tls_san         => 'k8s.example.com',
      kubeconfig_file => "$ENV{HOME}/.kube/mycluster.yaml",
    );
  };

  # Deploy RKE2 control plane with GPU support
  task "deploy_gpu_server", sub {
    rancher_deploy_server(
      distribution    => 'rke2',
      gpu             => 1,    # requires Rex::GPU installed
      reboot          => 1,    # reboot after driver install (first deploy)
      hostname        => 'gpu-cp-01',
      domain          => 'k8s.example.com',
      token           => 'my-secret',
      tls_san         => 'gpu-cp-01.k8s.example.com',
      kubeconfig_file => "$ENV{HOME}/.kube/gpu-cluster.yaml",
    );
  };

  # Deploy K3s worker with GPU support
  task "deploy_gpu_worker", sub {
    rancher_deploy_agent(
      distribution => 'k3s',
      gpu          => 1,    # requires Rex::GPU installed
      hostname     => 'gpu-01',
      domain       => 'k8s.example.com',
      server       => 'https://10.0.0.1:6443',
      token        => 'K10...',
    );
  };

  # Deploy a single-node cluster (control plane + workloads on same node)
  task "deploy_single_node", sub {
    rancher_deploy_server(
      distribution    => 'rke2',
      token           => 'my-secret',
      tls_san         => '10.0.0.1',
      kubeconfig_file => "$ENV{HOME}/.kube/single.yaml",
    );
    # Remove control-plane taint so workloads can be scheduled
    untaint_node(kubeconfig => "$ENV{HOME}/.kube/single.yaml");
  };

=head1 DESCRIPTION

L<Rex::Rancher> provides complete, zero-touch Kubernetes cluster deployment
for Rancher distributions (RKE2 and K3s) using the L<Rex> orchestration
framework. It handles everything from raw Linux node preparation through to
a running CNI and GPU device plugin.

GPU support is optional. Pass C<gpu =E<gt> 1> and install L<Rex::GPU>
separately. Rex::Rancher works identically for non-GPU nodes.

When deploying a GPU server node, the full pipeline runs automatically:

=over

=item 1. B<Node preparation> — hostname, timezone, locale, NTP, swap off,
kernel modules (br_netfilter, overlay), sysctl for Kubernetes networking.

=item 2. B<GPU setup> (C<gpu =E<gt> 1>) — NVIDIA driver via DKMS, optional
reboot, Container Toolkit, CDI specs, containerd runtime config. Handled by
L<Rex::GPU>.

=item 3. B<Cluster bring-up> — write config, run RKE2 or K3s install script,
wait for kubeconfig file on the remote host, fetch and save it locally,
wait for API server readiness via L<Kubernetes::REST>.

=item 4. B<Cilium CNI> — Cilium CLI installed on the remote host, Cilium
deployed with distribution-appropriate Helm values.

=item 5. B<NVIDIA device plugin> (C<gpu =E<gt> 1> + C<kubeconfig_file>) — DaemonSet
applied via the Kubernetes API, wait for C<nvidia.com/gpu> capacity on the
node. No C<kubectl> required anywhere.

=back

All Kubernetes API operations (steps 3 and 5) run locally on the machine
executing Rex using L<Kubernetes::REST> and L<IO::K8s>. No C<kubectl>
binary is needed on the remote host.

This distribution supports hosts without an SFTP subsystem (common on
Hetzner dedicated servers). Use C<set connection =E<gt> "LibSSH"> and
install L<Rex::LibSSH>.

For fine-grained control, use the individual modules directly:

=over

=item L<Rex::Rancher::Node> — Node preparation

=item L<Rex::Rancher::Server> — Control plane installation and config retrieval

=item L<Rex::Rancher::Agent> — Worker node installation

=item L<Rex::Rancher::Cilium> — Cilium CNI installation and upgrade

=item L<Rex::Rancher::K8s> — Kubernetes API operations (device plugin, readiness, untaint)

=back

=head2 rancher_deploy_server(%opts)

Full control plane deployment in a single call: prepare the node, optionally
set up GPU support, install the Kubernetes distribution, wait for the API,
install Cilium CNI, and deploy the NVIDIA device plugin.

When C<gpu =E<gt> 1> is passed and L<Rex::GPU> is installed, GPU detection
and driver installation are performed automatically as step 2 before the
cluster is brought up. After Cilium is running, the NVIDIA device plugin
DaemonSet is deployed via the local Kubernetes API (no C<kubectl> required
on the remote host) and the function waits for C<nvidia.com/gpu> resources
to appear on the node.

The full pipeline for a GPU server deployment:

=over

=item 1. C<prepare_node> — hostname, timezone, swap off, kernel modules, sysctl



( run in 2.303 seconds using v1.01-cache-2.11-cpan-39bf76dae61 )