App-FargateStack

 view release on metacpan or  search on metacpan

README.md  view on Meta::CPAN

- automatic creation of log groups with customizable retention period
- discovery of existing environment to intelligently populate configuration defaults
- automatically create a minimal Fargate app/service config from shorthand
- support for scheduled and metric based [autoscaling](#autoscaling)

## Minimal Configuration

Getting a Fargate task up and running requires that you provision and
configure multiple AWS resources. Stitching it together using
**Terraform** or **CloudFormation** can be tedious and time consuming,
even if you know what resources to provision AND how to stitch it
together.

The motivation behind writing this framework was to take the drudgery
of writing declarative resource generators for all of the resources required
to run a simple task, create basic web applications or RESTful
APIs. Instead, we wanted a framework that covered 90% of our use cases
while allowing our development workflow to go something like:

- Create a Docker image that implements our worker, web app or API
- Create a minimal configuration file that describes our application
- Execute the framework's script and create the necessary AWS infrastructure
- Launch the http server, daemon, scheduled job, or adhoc worker

Of course, this is only a "good idea" if creating the initial
configuration file is truly minimal, otherwise it becomes an exercise
similar to using Terraform or CloudFormation. So what is the minimum
amount of configuration to inform our framework so it can create our
Fargate worker? How's this for minimal?

    ---
    app:
      name: my-stack
    tasks:
      my-worker:
        type: task
        image: my-worker:latest
        schedule: cron(50 12 * * * *)

_TIP: You can use the ["create-stack"](#create-stack) command to create minimal
configuration files for various Fargate application scenarios._

Using this minimal configuration and running `app-FargateStack` like this:

    app-FargateStack plan

...the framework would create the following resources in your VPC:

- a cluster named `my-stack-cluster`
- a security group for the cluster
- an IAM role for the the cluster
- an IAM  policy that has permissions enabling your worker
- an ECS task definition that describes your task
- a CloudWatch log group
- an EventBridge target event
- an IAM role for EventBridge
- an IAM policy for EventBridge
- an EventBridge rule that schedules the worker

...so as you can see, rolling all of this by hand could be a daunting
task and one made even more difficult when you decide to use other AWS
resources inside your task like buckets, queues or an EFS file
systems!

## Web Applications

Creating a web application using a minimal configuration works too. To
build a web application you can start with this minimal configuration:

    ---
    app:
      name: my-web-app
    domain: my-web-app.example.com
    tasks:
      apache:
        type: https
        image: my-web-app:latest

This will create an externally facing web application for you with
these resources:

- A certificate for your domain
- A Fargate cluster
- IAM roles and policies
- A listener and listener rules
- A CloudWatch log group
- Security groups
- A target group
- A task definition
- An ALB if one is not detected

Once again, launching a Fargate service requires a
lot of fiddling with AWS resources! Getting all of the plumbing
installed and working requires a lot of what and how knowledge.

## Adding or Changing Resources

Adding or updating resources for an existing application should also
be easy. Updating the infrastructure should just be a matter of
updating the configuration and re-running the framework's script. When
you update the configuration the `App::FargateStack` will detect the
changes and update the necessary resources.

Currently the framework supports adding a single SQS queue, a single
S3 bucket, volumes using EFS mount points, environment variables and
secrets from AWS Secrets Manager.

    my-worker:
      image: my-worker:latest
      command: /usr/local/bin/my-worker.pl
      type: task
      schedule: cron(00 15 * * * *)   
      bucket:
        name: my-worker-bucket
      queue:
        name: my-worker-queue
      environment:
        ENVIRONMENT=prod
      secrets:
        db_password:DB_PASSWORD
      efs:

README.md  view on Meta::CPAN

#### Rule Set Keywords

- **base**: A strong baseline including `AWSManagedRulesCommonRuleSet`, `AWSManagedRulesAmazonIpReputationList`, and `AWSManagedRulesKnownBadInputsRuleSet`.
- **admin**: Protects exposed administrative pages (`AWSManagedRulesAdminProtectionRuleSet`).
- **sql**: Protects against SQL injection attacks (`AWSManagedRulesSQLiRuleSet`).
- **linux**: Includes rules for Linux and Unix-like environments.
- **php**: Includes rules for applications running on PHP.
- **wordpress**: Includes rules specific to WordPress sites.
- **windows**: Includes rules for Windows Server environments.
- **anonymous**: **Use with caution.** Blocks traffic from anonymous sources like VPNs and proxies, which may block legitimate users.
- **ddos**: Mitigates application-layer (Layer 7) DDoS attacks like HTTP floods.
- **premium**: **Warning: Extra Cost.** Enables advanced, paid protections for bot control and account takeover prevention.

#### Rule Bundles

- **default**: Includes `base` and `sql`. This is the recommended starting point for most applications.
- **linux-app**: Includes `default` and `linux`.
- **wordpress-app**: Includes `default`, `linux`, and `wordpress`.
- **windows-app**: Includes `default` and `windows`.
- **all**: Includes all standard, non-premium rule sets. **Warning:** This will likely exceed the default WCU quota and may incur additional costs.

### The Bootstrap Process (First Run)

On the first `apply` run with WAF enabled, the framework will perform
a one-time bootstrap:

1. It generates a default `web-acl.json` file in your project
directory. This file contains the complete definition of your Web ACL,
including the rules generated from your `managed_rules` keywords.
2. It calls `aws wafv2 create-web-acl` to create a new Web ACL.
3. It calls `aws wafv2 associate-web-acl` to link the new Web ACL to
your Application Load Balancer.
4. It updates your configuration file with the state of the new
WAF resources, including its Name, ID, ARN, LockToken, and a checksum
of the `web-acl.json` file.
5. The `waf` block in your `fargate-stack.yml` is updated to reflect
the bootstrapped state. If the `managed_rules` key was not present,
it will be added with the default value of `[default]`.

### Ongoing Management (Subsequent Runs)

After the initial creation, you take full control of the rules. To
add, remove, or modify rules, you simply edit the `web-acl.json` file
directly.

On subsequent runs of `apply`, `App::FargateStack` will:

- Calculate a checksum of your `web-acl.json` file.
- If the checksum has changed, it will safely update the remote Web ACL
with your new rule set.
- If the checksum has not changed, it will skip the update to avoid
unnecessary API calls.

This model gives you the best of both worlds: the "minimal
configuration, maximum results" of a secure default, and the full
"transparent box" control to customize your security posture as your
application's needs evolve.

### Conflict and Drift Management

The framework includes robust safety checks to prevent accidental data
loss. If it detects that the Web ACL has been modified in the AWS
Console _and_ you have also modified your local `web-acl.json` file,
it will detect the state conflict, refuse to make any changes, and
provide a clear error message with instructions on how to resolve it.

### Estimated Cost

The default WAF configuration is designed to provide a strong security
baseline while remaining cost-effective. When you enable WAF without
specifying any `managed_rules`, the framework applies the `default`
bundle, which includes the `base` and `sql` rule sets.

The approximate monthly cost for this default configuration is
**~$9.00 per month**, plus per-request charges.

The cost is broken down as follows:

- **$5.00 / month** for the Web ACL itself.
- **$4.00 / month** for the four AWS Managed Rule Groups included
in the `default` bundle (3 in 'base', 1 in 'sql').
- **$0.60 / per 1 million requests** processed by the Web ACL.

**Warning:** Enabling the `premium` rule set will incur significant
additional monthly and per-request fees for services like Bot Control
and Account Takeover Prevention. Always review the [AWS WAF
pricing](https://aws.amazon.com/waf/pricing/) page before enabling
premium features.

## Roadmap for HTTP Services

- path based routing on ALB listeners

[Back to Table of Contents](#table-of-contents)

# AUTOSCALING

## Overview

For services that experience variable load, such as HTTP applications or
background job processors, `App::FargateStack` can automate the process of
scaling the number of running tasks up or down to meet demand. This ensures
high availability during traffic spikes and saves costs during quiet periods.

The framework integrates with AWS Application Auto Scaling to provide target
tracking scaling policies. This allows you to define a target metric - such as
average CPU utilization or the number of requests per minute - and the framework
will automatically manage the number of Fargate tasks to keep that metric at
your desired level.

## Enabling Autoscaling

To enable autoscaling for a service, add an `autoscaling` block to its task
configuration in your .yml configuration file.

tasks:
  my-service:
    # ... other task settings ...
    autoscaling:
      min\_capacity: 1
      max\_capacity: 10
      cpu: 60

## Configuration Parameters

The `autoscaling` block accepts the following keys:

- **min\_capacity** (Required)

    The minimum number of tasks to keep running at all times. The service will
    never scale in below this number.

- **max\_capacity** (Required)

    The maximum number of tasks that the service can scale out to. This acts as
    a safeguard to control costs.

- **cpu** OR **requests** (Required, mutually exclusive)

    You must specify exactly one scaling metric.

    - `cpu`: The target average CPU utilization percentage across all tasks in
    the service. Valid values are between 1 and 100.
    - `requests`: The target number of requests per minute for each task. This
    is only valid for tasks of type `http` or `https` that are behind an
    Application Load Balancer.

- **scale\_in\_cooldown** (Optional)

    The amount of time, in seconds, to wait after a scale-in activity before
    another scale-in activity can start. This prevents the service from scaling
    in too aggressively.

    Default: `300`

- **scale\_out\_cooldown** (Optional)

    The amount of time, in seconds, to wait after a scale-out activity before
    another scale-out activity can start. This allows new tasks time to warm up
    and start accepting traffic before the service decides to scale out again.

    Default: `60`

- **policy\_name** (Managed by CApp::FargateStack)

    This is a unique name for the scaling policy generated by the framework. It
    is written to your configuration file and used to detect drift between your
    configuration and the live environment in AWS. You should not modify this
    value.

## Example: Scaling on CPU Utilization

This configuration will maintain at least 1 task, scale up to a maximum of 5
tasks, and will add or remove tasks to keep the average CPU utilization at or
near 60%.

    tasks:
      my-cpu-intensive-worker:
        type: daemon
        image: my-worker:latest
        autoscaling:
          min_capacity: 1
          max_capacity: 5
          cpu: 60

## Example: Scaling on ALB Requests

This configuration will maintain at least 2 tasks, scale up to a maximum of 20
tasks, and will add or remove tasks to keep the number of requests per minute
for each task at or near 1000. It also specifies custom cooldown periods.

    tasks:
      my-website:
        type: https
        image: my-website:latest
        autoscaling:
          min_capacity: 2
          max_capacity: 20
          requests: 1000
          scale_in_cooldown: 600
          scale_out_cooldown: 120

## Scheduled Scaling Configuration

To configure predictive, time-based scaling, add a `scheduled` block
inside the main `autoscaling` configuration. This allows you to
define named time windows for scaling.

Example:

    autoscaling:
      ...
      scheduled:
        business_hours:
          start_time: 00:18
          end_time: 00:02
          min_capacity: 2/1
          max_capacity: 3/2

_Note: **start\_time** and **end\_time** are UTC_

- **scheduled** (Optional)

    A hash where each key is a unique, descriptive name for the schedule
    group (e.g., `business_hours`). Each group defines a time window and
    the capacity changes for that window.

    - **start\_time** (Required): The time to scale up, in HH:MM
    format (24-hour clock, UTC).
    - **end\_time** (Required): The time to scale down, in HH:MM
    format (24-hour clock, UTC).
    - **days** (Required): The days of the week for the schedule. Can
    be a range (e.g., `MON-FRI`) or comma-separated values.
    - **min\_capacity** (Optional): The minimum capacity during and
    outside the window. The two values should be separated by a slash,
    comma, colon, hyphen, or space (e.g., `2/1` or `2,1`).
    - **max\_capacity** (Optional): The maximum capacity during and
    outside the window, using the same `in/out` format as
    `min_capacity`.

The parser will generate two scheduled actions from this block: one to
apply the "in" capacity at the `start_time` and one to apply the
"out" capacity at the `end_time`.

## Example: Combined Metric and Scheduled Scaling

This configuration creates a robust scaling strategy. The service will
reactively scale based on CPU load at all times, but the capacity
"guardrails" will be adjusted automatically for business hours.

    tasks:
      my-website:
        type: https
        image: my-website:latest
        autoscaling:
          # Default metric-based scaling policy
          min_capacity: 1
          max_capacity: 10
          cpu: 75
    
          # Scheduled scaling actions to adjust the guardrails
          schedule:
            business_hours:
              start_time: "09:00"
              end_time: "18:00"
              days: MON-FRI
              min_capacity: 2/1
              max_capacity: 10/2

## Drift Detection and Management

CApp::FargateStack treats your YAML configuration as the single source of
truth. On every `plan` or `apply` run, it will compare the `autoscaling`
configuration in your file with the live scaling policy in AWS.

If it detects any differences (e.g., someone manually changed the max capacity
in the AWS Console), it will report the drift and will not apply any changes.
To overwrite the live settings and enforce the configuration from your file,
you must re-run the `apply` command with the `--force` flag. This provides a
critical safety check against accidental configuration changes.

### The `autoscaling` keyword

For any service type (`https`, `http`, `daemon`, or `task`), you can enable
and configure autoscaling directly from the command line. This provides a
quick-start method to make your service elastic from the moment it's created.

The Cautoscaling: keyword accepts a metric and an optional target value:

- **Enable with a specific target value:**

    autoscaling:requests=500
    autoscaling:cpu=60

    This will enable autoscaling and set the target for either ALB requests per
    task or average CPU utilization.

- **Enable with default target value:**

    autoscaling:requests
    autoscaling:cpu

    If you omit the target value, a sensible default will be used (e.g.,
    `500` for requests, `60` for CPU).

When the `create-stack` command sees the Cautoscaling: keyword, it
will generate a complete `autoscaling` block in your `fargate-stack.yml`
file. This block will be populated with safe defaults (`min_capacity: 1`,
`max_capacity: 2`), the specified metric, and all other necessary fields,
making it easy to review and customize later. See ["AUTOSCALING"](#autoscaling) for
a full list of configuration options.

[Back to Table of Contents](#table-of-contents)

# CURRENT LIMITATIONS

- Stacks may contain multiple daemon services, but only one task
may be exposed as an HTTP/HTTPS service via an ALB.
- Limited configuration options for some resources such as
advanced load balancer listener rules, custom CloudWatch metrics, or
task health check tuning.
- Some out of band infrastructure changes may break the ability
to re-run `app-FargateStack` without manually updating the
configuration
- Support for only 1 EFS filesystem per task
- This framework assumes that the
[operatingSystemFamily](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters_ec2.html#runtime-platform_ec2)
is "LINUX" and the `cpuArchitecture` is "X86\_64" LINUX. This is
unlikely to change.

[Back to Table of Contents](#table-of-contents)

# TROUBLESHOOTING

## Warning: task placed in a public subnet

When running a task you may see:

    [2025/08/05 03:40:58] run-task: subnet-id: [subnet-7c160c37] is in a public subnet...consider running your jobs in a private subnet

This means the task is being scheduled in a subnet that has a
0.0.0.0/0 route to an Internet Gateway (a public subnet).

While not fatal, placing tasks in public subnets is discouraged unless
you have a specific need.

### Why this matters

Running tasks in public subnets can introduce risk and operational
surprises:

- Accidental exposure

    If the task is assigned a public IP and the security group allows
    inbound access, it may be reachable from the internet.

- Unintended dependency

    Public-subnet egress typically relies on a public IP and the Internet
    Gateway. That can bypass intended egress controls, logging, or central
    inspection.

- Narrow security margin

    Safety depends entirely on security groups and NACLs. A small
    misconfiguration can expose services or data.

### Recommended pattern

Use private subnets for most Fargate workloads. Private subnets do not
route directly to the internet.

If the task needs outbound access (for example, to pull images from
ECR or call external APIs), use one of:

- A NAT Gateway (private subnet egress to the internet)
- VPC interface endpoints for ECR (ecr.api and ecr.dkr) and a
gateway endpoint for S3, so image pulls stay inside the VPC with no
public IPs

For public-facing applications, the common pattern is: tasks in
private subnets, fronted by a public Application Load Balancer in
public subnets.

### When is a public subnet acceptable?

Use a public subnet only when the task itself must have a public IP
and terminate client connections directly (uncommon). If you do:

- Set assignPublicIp=ENABLED so the task can reach the internet
via the Internet Gateway
- Keep security groups locked down and monitor egress on TCP 443

### Note on image pulls

To pull from ECR, the task needs a path to ECR API, ECR DKR, and S3:

- Public subnet: requires a public IP (assignPublicIp=ENABLED),
unless you provision VPC endpoints
- Private subnet: works via a NAT Gateway, or entirely private
via VPC endpoints (no public IPs)

## My task fails with this message:

    ResourceInitializationError: unable to pull secrets or registry auth:
    The task cannot pull registry auth from Amazon ECR: There is a
    connection issue between the task and Amazon ECR. Check your task
    network configuration. operation error ECR: GetAuthorizationToken,
    exceeded maximum number of attempts, 3, https response error
    StatusCode: 0, RequestID: , request send failed, Post
    "https://api.ecr.us-east-1.amazonaws.com/": dial tcp 44.213.79.10:443:
    i/o timeout



( run in 0.711 second using v1.01-cache-2.11-cpan-df04353d9ac )