Saturday, May 20, 2017

AWS Powershell Tools Snippets: S3 Multipart Upload Cleanup

My company does quite a bit with AWS S3. We use it to store static files and images, we push backups to it, we use it to deliver application artifacts, and the list goes on.

When you push a significant amount of data to and from S3, you're bound to experience some network interruptions that could stop an upload. Most of the time S3 clients will recover on their own, but there are some cases where it might struggle.




One such case is when you are pushing a large file and using S3 Multi Part Uploads. This can leave you with pieces of files sitting in S3 that are not useful for anything, but still taking up space and costing you money. We recently worked with AWS support to get a report of how how many incomplete uploads we had sitting around, and it was in the double digit terabytes!

We started looking for a way to clean them up and found that AWS recently created a way to manage these with a bucket lifecycle policy. Some details are in a doc here and there's an example of how to create this policy on the AWS CLI towards the bottom.

We decided to recreate this functionality in Powershell using "Write-S3LifecycleConfiguration Cmdlet" to make it a little easier to apply the policy to all of the buckets in our account at once.

It took a little reverse engineering. The Write-S3LifecycleConfiguration commandlet doesn't have many useful examples. In the end I wound up creating the policy I wanted in the AWS console, and then using Get-S3LifecycleConfiguration to see how AWS is representing the policies in their .NET class structure.

It seems to me that there are a lot of classes between you and creating this policy, but that could mean that AWS has future plans to make these policies even more dynamic and useful.

The code I came up with at the end is below. Hope it's helpful!


$rule = new-object -typename Amazon.S3.Model.LifecycleRule;
$incompleteUploadCleanupDays = new-object -typename Amazon.S3.Model.LifecycleRuleAbortIncompleteMultipartUpload
$incompleteUploadCleanupDays.DaysAfterInitiation = 7
$rule.AbortIncompleteMultipartUpload = $incompleteUploadCleanupDays
$rule.ID = "WholeBucketPolicy"
$rule.status = "Enabled"

$prefixPredicate = new-object -type Amazon.S3.Model.LifecyclePrefixPredicate

$lifecycleFilter = new-object -type Amazon.S3.Model.LifecycleFilter

$lifecycleFilter.LifecycleFilterPredicate = $prefixPredicate

$rule.Filter = $lifecycleFilter

foreach ($bucket in get-s3bucket) {
    write-host "Bucket name: $($bucket.bucketname)"
    $existingRules = get-s3lifecycleconfiguration -bucketname $bucket.bucketname
    $newPolicyNeeded = $true;
    foreach ($existingRule in $existingRules.rules) {
            if($existingRule.ID -eq $rule.ID) {
                write-host "Policy $($rule.ID) already exists, skipping bucket"
                $newPolicyNeeded = $false;
            }
    }
    if($newPolicyNeeded) {
        write-host "Rule not found, adding"
        $existingRules.rules += $rule

        Write-S3LifecycleConfiguration -bucketname $bucket.bucketname -configuration_rule $existingRules.rules
    }
}