AWS Fargate DockerTimeoutError

Save on network usage

If you’re using Fargate/ECS with container images from ECR you might noticed that it is fetching the image from through the public network (your internet or NAT gateway) making a significantly higher than expected monthly bill. To avoid that you quickly find out about this neat feature called “VPC Endpoints”. Works really well and when you do it on your EC2 instances everything is as expected.

Doesn’t start in Fargate

However when you want to start a task or service using Fargate you keep getting:

Really annoying error message and doesn’t say anything specific. I’ve run into this after moved my services to use the VPC endpoints. I checked IAM roles, Security Groups, Route Tables, everything I could think of but didn’t find a solution, until I ran into this bug report: https://github.com/aws/containers-roadmap/issues/48

It is CloudWatch, not Docker

It turned out the error has nothing to do with Docker or ECR. It is a problem of Fargate not being able to write to Cloudwatch as it used to before.

The fix was relatively simple:

  1. Add a cloudwatch endpoint to your VPC
  2. Enable private DNS for the endpoint

In order to do that go to VPC -> Endpoint -> Create Endpoint, type logs in search.

Pick your VPC:

Select the subnets, enable DNS name and select the Security Groups:

Verify

You can verify if it’s working by checking the IP address from an EC2 instance if you have one in the same subnet where you’re starting your Fargate task/service.

Don’t forget to change eu-west-1 to your region.

Before

After

Bookmark the permalink.