One of my (not so) favorite things to do when learning cloud is discovering gaps in documentation.  When I wanted to import DynamoDB data, I thought it would be easy to use the format here:

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/S3DataImport.Format.html

It turns out it needs to be a bit different than that.  At first, I thought maybe it could be as easy as making a backup and seeing what the format was like. Unfortunately, DynamoDB backups have one record per file (at least the amount of rows I had).  This really doesn't scale will if you have thousands of records. 

I found the format actually needs the below in order to import using the build-it import function:

{"Item": {"id": {"N": "0"}, "admin": {"S": "fname"}, "type": {"S": "lname"}}}
{"Item": {"id": {"N": "1"}, "admin": {"S": "fname"}, "type": {"S": "lname"}}}

To get it to this format, I did have to go through a few hoops.  First, I used 'export-dyanmodb' script found here:

https://github.com/truongleswe/export-dynamodb

This script would also export each entry as one file, so I have to parse each file, convert the multiple-lines into a single line like above and put it in one file. Here is my sample script I used to do that:

import json
import os

files = os.listdir("./dump/<table>/data")
for filename in files:

  f = open('./dump/<table>/data/' + filename, 'r')

    json_file = json.loads(f.read())

    for line in json_file['Items']:
        item = {}
        item['Item'] = line

      with open('<table>.json','a') as f2:
            f2.write(json.dumps(item)+"\n")
            f2.close()

f.close()

After the records were in one text file I could then use gzip to compress the file to save some space.

No comments