seeding a dynamodb-local instance with docker-compose

TL;DR - to seed a dynamodb-local database, you can run multiple AWSCLI commands in docker-compose.  To do that, override the amazon/aws-cli docker image's entrypoint and command to call a script.  In the script you can optionally add in command to keep the image up and running after the seed process is complete.

docker-compose:

  dynamoSeeder:
    working_dir: /home/ohell/server/src/migrations/
    volumes:
      - ./:/home/ohell/server
    entrypoint: ["/bin/bash"]
    command: -c "./init_database.sh --forever"
    image: amazon/aws-cli
    depends_on:
      - localdynamo
    links:
      - localdynamo
    environment:
      - AWS_ACCESS_KEY_ID=something
      - AWS_SECRET_ACCESS_KEY=something
      - REGION=us-east-1
      - DYNAMODB_ENDPOINT=http://localdynamo:8000
  localdynamo:
    command: -jar DynamoDBLocal.jar -inMemory
    user: root
    image: "amazon/dynamodb-local:latest"
    container_name: dynamodb-local
    ports:
      - "8000:8000"
    volumes:
      - "./test/testDataVolume:/home/dynamodblocal/data"
    working_dir: /home/dynamodblocal

And the script to seed dynamoBD-local:

#!/bin/bash

export AWS_SECRET_ACCESS_KEY="something"
export AWS_ACCESS_KEY_ID="something"
export AWS_REGION=us-east-1

aws dynamodb create-table --cli-input-json file://000_create_ohell_dev.json --endpoint-url http://localdynamo:8000
aws dynamodb batch-write-item --request-items file://001_initialseed.json --endpoint-url http://localdynamo:8000
aws dynamodb update-table --cli-input-json file://002_create_GSI1.json --endpoint-url http://localdynamo:8000

echo "database seed complete"
if [ $1 == "--forever" ]
then
    echo "staying up to keep dependent services happy"
    tail -f /dev/null
fi

from the beginning:

I'm using dynamoDB.  In order to run local integration tests, I am using dynamoBD-local, which comes packaged as the docker image amazon/dynamodb-local.  Performance is not good.
I had been starting it up like this in my docker-compose file:

  localdynamo:
    command: -jar DynamoDBLocal.jar -sharedDb -dbPath ./data
    user: root
    image: "amazon/dynamodb-local:latest"
    container_name: dynamodb-local
    ports:
      - "8000:8000"
    volumes:
      - "./test/testDataVolume:/home/dynamodblocal/data"
    working_dir: /home/dynamodblocal

The approach I'm taking here is: I have a source controlled binary file (./test/testDataVolume) that I use to bootstrap the database; basically it's a snapshot that has a couple tables defined along with a master data record.  No heavy lifting, but it requires three AWS CLI calls to set up.  Initally I tried just making the three calls once a fresh dynamodb-local container was up, but it got tricky and I settled on this approach.

Problem is, it is slow as dirt, and my tests are taking too long to run.  There is an eventual-consistency delay I have had to do all kinds of stuff to run around; that is, the delay between when I do a documentClient put, and when the record is readable.  I've even written an exponential backoff routine, so I can have my test keep retrying a record read until some time limit, and I'm seeing that fail after 18 seconds.  Look at this typescript example:


const defaultWait = 10
const maxWait = 100


export async function backOffAsyncUntilResolveToTarget<T>(targetMatcher: Function, thisToUse: any, func: Function, params: any[], maxWaitMs = maxWait, waitMs = 0): Promise<T>{
    let result: T
    try{
        await waitForDb(waitMs)
        result = await func.call(thisToUse, ...params)
        if(!targetMatcher(result)){
            throw new Error('did not match value')
        }
    } catch(ex) {
        if(waitMs <= maxWaitMs){
            result = await backOffAsyncUntilResolveToTarget<T>(targetMatcher, thisToUse, func, params, maxWaitMs, (waitMs || defaultWait) * 2)
        } else throw new Error(`Could not match within ${maxWaitMs}`)
    }
    return result
}

If you have to do this so that your local integration tests pass, you should see this as a problem.  You don't deserve to live like this.

I have read hints that the performance issue is because dynamoBD-local uses SqlLite in the backend, and if you use the -sharedDb option instead of in-memory, it is making slow-as-tar file system calls.  So, we want to instead see if we can speed up the whole thing and make tests more reliable by making it in memory, and setting up the tables and seed data before each test run.

So, we're starting here in sharedDB mode:

  localdynamo:
    command: -jar DynamoDBLocal.jar -sharedDb -dbPath ./data
    user: root
    image: "amazon/dynamodb-local:latest"
    container_name: dynamodb-local
    ports:
      - "8000:8000"
    volumes:
      - "./test/testDataVolume:/home/dynamodblocal/data"
    working_dir: /home/dynamodblocal

Let's change that to in-memory.

  localdynamo:
    command: -jar DynamoDBLocal.jar -inMemory
    user: root
    image: "amazon/dynamodb-local:latest"
    container_name: dynamodb-local
    ports:
      - "8000:8000"

Now I'm going to try the approach of running my commands against that database.  This works from my dev WSL2 ubuntu instance, because I have a reasonable shell with aws-cli installed:

docker-compose up localdynamo
init_database.sh

Where init_database.sh is

#!/bin/bash

aws dynamodb create-table --cli-input-json file://000_create_ohell_dev.json --endpoint-url http://localhost:8000
aws dynamodb batch-write-item --request-items file://001_initialseed.json --endpoint-url http://localhost:8000
aws dynamodb update-table --cli-input-json file://002_create_GSI1.json --endpoint-url http://localhost:8000

and then I have these scripts:

Create the table:

{
    "TableName": "ohell_dev",
    "KeySchema": [
      { "AttributeName": "pk", "KeyType": "HASH" },
      { "AttributeName": "sk", "KeyType": "RANGE" }
    ],
    "AttributeDefinitions": [
      { "AttributeName": "pk", "AttributeType": "S" },
      { "AttributeName": "sk", "AttributeType": "S" }
    ],
    "ProvisionedThroughput": {
      "ReadCapacityUnits": 5,
      "WriteCapacityUnits": 5
    }
}

Bootstrap this one record:

{   "ohell_dev": [
    {
        "PutRequest": {
            "Item": {
                "pk": {"S":"chrm#front"},
                "sk": {"S":"chrm#front"},
                "kind": {"S":"chatRoom"},
                "channelName": {"S":"front"},
                "ownerUserId": {"S":"0"},
                "members": {"L":[]},
                "isprivate": {"B": "false"},
                "LogLength": {"N":"100"},
                "LogDurationHours": {"N":"24"},
                "chatLog": {"L":[]}
            }
        }
    }]
}

Create a GSI:

{
    "TableName": "ohell_dev",
    "AttributeDefinitions":[  
        {  
           "AttributeName":"gsi1_pk",
           "AttributeType":"S"
        },
        {  
           "AttributeName":"gsi1_sk",
           "AttributeType":"S"
        }
     ],
    "GlobalSecondaryIndexUpdates" : [
        {
            "Create": {
                "IndexName": "GSI1",
                "KeySchema": [
                    {"AttributeName":"gsi1_pk","KeyType":"HASH"},
                    {"AttributeName":"gsi1_sk","KeyType":"RANGE"}
                ],
                "Projection":{
                    "ProjectionType":"ALL"
                },
                "ProvisionedThroughput": {
                    "ReadCapacityUnits": 1,
                    "WriteCapacityUnits": 1
                }
            }
        }
    ]
}

Ok, so tests fail with this kind of error:
[ResourceNotFoundException: Cannot do operations on a non-existent table]

The reason for this is we're no longer using a shared database, and I used different credentials for seeding the db from the console vs running my tests.

init_database.sh

#!/bin/bash

export AWS_SECRET_ACCESS_KEY="something"
export AWS_ACCESS_KEY_ID="something"
export AWS_REGION=us-east-1

aws dynamodb create-table --cli-input-json file://000_create_ohell_dev.json --endpoint-url http://localhost:8000
aws dynamodb batch-write-item --request-items file://001_initialseed.json --endpoint-url http://localhost:8000
aws dynamodb update-table --cli-input-json file://002_create_GSI1.json --endpoint-url http://localhost:8000

Yay!  All tests pass.  So, now I want to run these commmands from within docker-compose.  Tricky facts:

Making your own docker image with a reasonable shell and aws-cli is a pain.
The existing amazon/aws-cli just wants to run a single command and then exit.

Luckily we can override the amazon/aws-cli default behavior by doing this kind of thing in our docker-compose:

  dynamoSeeder:
    working_dir: /home/ohell/server/src/migrations/
    volumes:
      - ./:/home/ohell/server
    entrypoint: ["/bin/bash"]
    command: -c ./init_database.sh
    image: amazon/aws-cli
    depends_on:
      - localdynamo
    links:
      - localdynamo
    environment:
      - AWS_ACCESS_KEY_ID=something
      - AWS_SECRET_ACCESS_KEY=something
      - REGION=us-east-1
      - DYNAMODB_ENDPOINT=http://localdynamo:8000

This almost gets us there.  The only problem now is that our dynamo-seeder exits with a success code (status 0) as soon as it's done.  This makes our test command tricky - see package.json:

"testdocker": "git checkout test/testDataVolume && docker-compose up --exit-code-from ohellTest",

We are using the exit code from the test runner, but instead the dynamo-seeder shutting down dumps the whole stack.
We need to feed in some kind of command to run after the shell script that will keep the seeder running after its work is complete.  What I finally settled on was modifying the database init script as follows:

#!/bin/bash

export AWS_SECRET_ACCESS_KEY="something"
export AWS_ACCESS_KEY_ID="something"
export AWS_REGION=us-east-1

aws dynamodb create-table --cli-input-json file://000_create_ohell_dev.json --endpoint-url http://localdynamo:8000
aws dynamodb batch-write-item --request-items file://001_initialseed.json --endpoint-url http://localdynamo:8000
aws dynamodb update-table --cli-input-json file://002_create_GSI1.json --endpoint-url http://localdynamo:8000

echo "database seed complete"
if [ $1 == "--forever" ]
then
    echo "staying up to keep dependent services happy"
    sleep 10000
fi

(edit: I originally had tail -f /dev/null/ instead of sleep 10000 .  That is not what you want. )

Then we change the dynamo-seeder command line in docker-compose to this:

  dynamoSeeder:
    working_dir: /home/ohell/server/src/migrations/
    volumes:
      - ./:/home/ohell/server
    entrypoint: ["/bin/bash"]
    command: -c "./init_database.sh --forever"
    image: amazon/aws-cli
    depends_on:
      - localdynamo
    links:
      - localdynamo
    environment:
      - AWS_ACCESS_KEY_ID=something
      - AWS_SECRET_ACCESS_KEY=something
      - REGION=us-east-1
      - DYNAMODB_ENDPOINT=http://localdynamo:8000

Voila.  There is assuredly a smoother way to do this using either better docker or bash usage; let me know if you have any ideas.