App-FargateStack

 view release on metacpan or  search on metacpan

lib/App/FargateStack/Pod.pm  view on Meta::CPAN

create all the necessary AWS resources.

  app-FargateStack apply

=head3 Step 4: Deploy and Start the Service

The C<apply> command creates all the necessary B<infrastructure>, but
it does not start your service. This separation allows you to manage
your infrastructure and your application's runtime state
independently.

To create the ECS service and start your container, use the
C<deploy-service> command.

  app-FargateStack deploy-service my-stack-daemon

By default, this will start one instance of your task. To check its
status, use the C<status> command:

  app-FargateStack status my-stack-daemon

And to stop the service, simply run:

  app-FargateStack stop-service my-stack-daemon

To restart a stopped service, run:

  app-FargateStack start-service my-stack-daemon

=head2 VPC AND SUBNET DISCOVERY

If you do not specify a C<vpc_id> in your configuration, the framework will attempt
to locate a usable VPC automatically.

A VPC is considered usable if it meets the following criteria:

=over 4

=item * It is attached to an Internet Gateway (IGW)

=item * It has at least one available NAT Gateway

=back

If no eligible VPCs are found, the process will fail with an error. If multiple
eligible VPCs are found, the framework will abort and list the candidate VPC IDs.
You must then explicitly set the C<vpc_id:> in your configuration to resolve
the ambiguity.

If exactly one eligible VPC is found, it will be used automatically,
and a warning will be logged to indicate that the selection was
inferred.

=head2 SUBNET SELECTION

If no subnets are specified in the configuration, the framework will query all
subnets in the selected VPC and categorize them as either public or private.

The task will be placed in a private subnet by default. For this to succeed,
your VPC must have at least one private subnet with a route to a NAT Gateway,
or have appropriate VPC endpoints configured for ECR, S3, STS, CloudWatch Logs,
and any other services your task needs.

If subnets are explicitly specified in your configuration, the
framework will validate them and warn if they are not reachable or are
not usable for Fargate tasks.

=head3 Task placement and Availability Zones

The framework places each task's ENI into exactly one subnet, which fixes
that task in a single AZ. A service can span multiple AZs by listing
subnets from at least two AZs.

What the framework does:

=over 4

=item * Prefers private subnets

If private subnets are defined in the configuration, tasks are placed
there. If no private subnets are defined, the framework falls back to
public subnets.

=item * Aligns ALB AZs with task placement

When a load balancer is used, the framework enables the ALB in the same
AZ set it selects for tasks (best practice). This is for resilience and
to avoid unnecessary cross-AZ hops; it is not a hard technical requirement.

=item * Requires two subnets

The configuration must specify at least two subnets in different AZs.
If subnets are not specified, the framework attempts to discover them,
but still requires at least two usable subnets (either both private or
both public). If fewer than two are available, it errors with guidance.

=back

Notes on internet access and ALBs:

=over 4

=item * Internet-facing ALB

An internet-facing ALB must be created in public subnets. Tasks may (and
usually should) remain in private subnets behind it.

=item * Egress from private subnets

For image pulls and outbound calls, use either a NAT Gateway in each AZ
or VPC endpoints for ECR (api and dkr) and S3.

=item * Egress from public subnets

If tasks are placed in public subnets without endpoints or NAT, they
require C<assignPublicIp=ENABLED> to reach ECR/S3.

=back

=head2 REQUIRED SECTIONS

At minimum, your configuration must include the following:

  app:
    name: my-stack

  tasks:
    my-task:
      image: my-image
      type: daemon | task | http | https

For task types C<http> or C<https>, you must also specify a domain name:

  domain: example.com

=head2 FULL SCHEMA OVERVIEW

The framework will expand and update your configuration file with default values as needed.
Here is the full schema outline. All keys are optional unless otherwise noted:

  ---
  account:
  alb:
    arn:
    name:
    port:
    type:
  app:
    name:             # required
    version:
  certificate_arn:
  cluster:
    arn:
    name:
  default_log_group:
  domain:              # required for http/https tasks
  id:
  last_updated:
  region:
  role:
    arn:
    name:
    policy_name:
  route53:
    profile:
    zone_id:
  security_groups:
    alb:
      group_id:
      group_name:
    fargate:
      group_id:
      group_name:
  subnets:
    private:

lib/App/FargateStack/Pod.pm  view on Meta::CPAN

=item * Support for only 1 EFS filesystem per task

=item * This framework assumes that the
L<operatingSystemFamily|https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters_ec2.html#runtime-platform_ec2>
is "LINUX" and the C<cpuArchitecture> is "X86_64" LINUX. This is
unlikely to change.

=back

=head1 TROUBLESHOOTING

=head2 Warning: task placed in a public subnet

When running a task you may see:

  [2025/08/05 03:40:58] run-task: subnet-id: [subnet-7c160c37] is in a public subnet...consider running your jobs in a private subnet

This means the task is being scheduled in a subnet that has a
0.0.0.0/0 route to an Internet Gateway (a public subnet).

While not fatal, placing tasks in public subnets is discouraged unless
you have a specific need.

=head3 Why this matters

Running tasks in public subnets can introduce risk and operational
surprises:

=over 4

=item * Accidental exposure

If the task is assigned a public IP and the security group allows
inbound access, it may be reachable from the internet.

=item * Unintended dependency

Public-subnet egress typically relies on a public IP and the Internet
Gateway. That can bypass intended egress controls, logging, or central
inspection.

=item * Narrow security margin

Safety depends entirely on security groups and NACLs. A small
misconfiguration can expose services or data.

=back

=head3 Recommended pattern

Use private subnets for most Fargate workloads. Private subnets do not
route directly to the internet.

If the task needs outbound access (for example, to pull images from
ECR or call external APIs), use one of:

=over 4

=item * A NAT Gateway (private subnet egress to the internet)

=item * VPC interface endpoints for ECR (ecr.api and ecr.dkr) and a
gateway endpoint for S3, so image pulls stay inside the VPC with no
public IPs

=back

For public-facing applications, the common pattern is: tasks in
private subnets, fronted by a public Application Load Balancer in
public subnets.

=head3 When is a public subnet acceptable?

Use a public subnet only when the task itself must have a public IP
and terminate client connections directly (uncommon). If you do:

=over 4

=item * Set assignPublicIp=ENABLED so the task can reach the internet
via the Internet Gateway

=item * Keep security groups locked down and monitor egress on TCP 443

=back

=head3 Note on image pulls

To pull from ECR, the task needs a path to ECR API, ECR DKR, and S3:

=over 4

=item * Public subnet: requires a public IP (assignPublicIp=ENABLED),
unless you provision VPC endpoints

=item * Private subnet: works via a NAT Gateway, or entirely private
via VPC endpoints (no public IPs)

=back

=head2 My task fails with this message:

 ResourceInitializationError: unable to pull secrets or registry auth:
 The task cannot pull registry auth from Amazon ECR: There is a
 connection issue between the task and Amazon ECR. Check your task
 network configuration. operation error ECR: GetAuthorizationToken,
 exceeded maximum number of attempts, 3, https response error
 StatusCode: 0, RequestID: , request send failed, Post
 "https://api.ecr.us-east-1.amazonaws.com/": dial tcp 44.213.79.10:443:
 i/o timeout

This error usually occurs when your task is launched in a subnet that
does not have outbound access to the internet. Internet access - or a
properly configured VPC endpoint - is required for Fargate to
authenticate with ECR and pull your container image.

=head3 Common causes

=over 4

=item * The task was placed in a public subnet but was not assigned a
public IP.

=item * The task was placed in a private subnet without access to a
NAT gateway or VPC endpoints.

=back

Even though the subnet may have a route to an Internet Gateway (i.e.,
it is technically a "public" subnet), if the task does not receive a
public IP, it cannot use that route to reach external services like
ECR or Secrets Manager.

=head3 How to fix it

=over 4

=item * If using public subnets, ensure the task is assigned a public
IP.

=item * If using private subnets, ensure a NAT gateway is available
and the subnet has a route to it.

=item * Alternatively, configure VPC endpoints for ECR, Secrets
Manager, and related services to avoid needing internet access
altogether.

=back

=head3 Note on Subnet Selection

C<App::FargateStack> attempts to prevent this situation by analyzing
your VPC configuration during planning. It categorizes subnets as
private or public and evaluates whether they provide the necessary
network access to launch a Fargate task successfully. The framework
warns if you attempt to use a subnet that lacks internet or endpoint
access.

=head2 My task failed to start and the reason is unclear

This is one of the most common and frustrating scenarios when working
with Fargate. You run C<start-service> or C<run-task>, the command
seems to succeed, but then the task quickly stops. The C<status>
command shows the desired count is 1 but the running count is 0, and
the logs are empty.

This often happens due to a B<resource initialization error>. The
problem isn't with your container image itself, but with the
infrastructure Fargate is trying to set up for it.

Common causes include:

=over 4

=item *

B<Networking Issues>: The task is in a subnet that can't pull the
image from ECR (e.g., no NAT Gateway or VPC endpoints).

=item *

B<Permissions Errors>: The task's IAM role is missing a required
permission.

=item *

B<EFS Mount Failures>: The task cannot mount an EFS volume, often due
to a misconfigured security group or incorrectly specified path.

=back

These errors are opaque because they happen deep inside the
AWS-managed environment. The high-level ECS API only reports a generic
failure, and since it's not an API call error, it won't appear in
CloudTrail.

=head3 The Solution: Finding the C<stoppedReason>

To solve this, C<App-FargateStack> provides an optional argument to
the C<list-tasks> command. By default, this command only shows
C<RUNNING> tasks. However, if you add the C<stopped> argument, it will
show recently stopped tasks and, most importantly, the reason they
stopped.

B<The Command:>

 app-FargateStack list-tasks stopped

This will display a table of stopped tasks, including a C<Stopped
Reason> column. This column often contains the detailed, multi-line
error message from the underlying AWS service that caused the failure,
giving you the exact information you need to debug the problem.

For example, if an EFS mount failed, the C<stoppedReason> might
contain:

 ResourceInitializationError: failed to invoke EFS utils
 commands... mount.nfs4: mounting failed, reason given by server: No
 such file or directory

This tells you immediately that the problem is with the EFS path, not
a generic "task failed" message.

=head2 Why is my task or service still using an old image?

This is one of the most common points of confusion when working with
ECS and Fargate.

You may have just built and pushed a new image to ECR using the same
tag (e.g. C<latest>), but when you launch a task or deploy a service,
ECS appears to continue using the old image.  Here's why.

=head3 One-off tasks: C<run-task> uses a fixed image digest

When you run a task using:

  app-FargateStack run-task my-task



( run in 0.513 second using v1.01-cache-2.11-cpan-39bf76dae61 )