Scanning uploaded files in S3 Buckets for malware

A laptop screen showing a prompt after detecting a malicious file
Photo by Ed Hardie / Unsplash

Working on a recent project, we looked at various ways to ensure that our system was as secure as possible.

It's great that today, we can easily use cloud storage services like AWS S3, Azure Storage Accounts and Google Cloud Storage without having to worry about managing all of the underlying hardware, operating system services and things like geo-redundant backups. But at the same time it's also easy to believe that using Platform as a Service components will handle all of the security and management "stuff" for you. Unfortunately, ensuring that the files you're storing are protected against malware and viruses isn't handled automatically*.

The product we built for our client allows users to upload files that are stored in AWS S3 Buckets. We wanted to protect the system and other users from the threat of malicious files being uploaded (inadvertently or intentionally). This post describes the approach we took and lessons learned along the way that you might find helpful if you're in a similar situation.

* OK, Azure does include Microsoft Defender for Storage, but it's not a built-in service, must be separately enabled and includes extra costs.

Choosing a service to scan AWS S3 Buckets

There are several services that can scan files in S3 Buckets, we considered:

Our priorities for selecting a service:

  • We preferred a Software as a Service solution over self-hosted solutions
  • We wanted to keep operating costs low given our client is a start up
  • We wanted something that could be integrated in our deployment processes that use AWS CDK
  • We wanted a service that provides support and is backed by a Service Level Agreement (SLA) to support our client if something goes wrong
  • We wanted file scanning to be a "black box": files are sent to the service and results are returned without us having to know about its internals

In the end, we settled on Trend Cloud One File Storage Security as it ticked all the boxes for us.

Serverless ClamScan gets an honourable mention - as an open source CDK construct it could plug easily into our environment, however we were concerned about ongoing support and a lack of an SLA - see this post for a good description.

Integrating with File Storage Security

Trend Micro provides good documentation describing their platform and integrating it into a solution. Briefly, the solution includes the following components:

  • Scanner stack - the component that performs that file scan to detect bad things. You only need one Scanner stack in your environment.
  • Storage stack - this component is responsible for detecting new files uploaded to your S3 Bucket, generating a pre-signed URL and passing the URL to the scanner stack. You need one Storage stack for each S3 Bucket you want to scan, however your Storage stacks can share a single Scanner stack.
  • Post scan plug-ins - one or more AWS Lambdas that process scan results. Trend Micro provides some examples and ready-to-use plugins and you can also create your own.

The following diagram is a simplified high-level architecture of the file scanning solution we implemented using File Storage Security.

An architecture diagram showing an S3 Bucket integrated with Trend Cloud One File Storage Security Components (Storage Stack and Scanner Stack). Results from the scan are passed to an AWS Lambda, named 'malware-scan-lambda'. If the scan fails (i.e. the file is malicious) the Lambda deletes the offending file and sends a notification to Slack via a WebHook.
High-level File Security Scanning logical architecture

When files are added to the user-files S3 Bucket, they are picked up by the Storage stack (which listens for the S3:ObjectCreated event). The Storage stack includes an AWS SNS topic that you can subscribe to for receiving results of a file scan.

In our solution, we created a simple AWS Lambda, malware-scan-lambda, to process file scan results. If a scan result indicated that the file was malicious, we delete it from it's source Bucket and send a notification to Slack through a WebHook. Naturally, you can take whatever action you want with the scan result and Trend also provides some example Lambda implementations that you can use as a starting point or plug directly into your solution.

Implementation

Trend Micro already have excellent documentation for File Storage Security that covers their architecture, deployment options and configuration so I don't want to repeat that information here. Instead I think it is helpful to talk about some of the challenges we encountered and how we resolved them.

Deployment using AWS CDK

Our team uses CDK for deploying our infrastructure to AWS. We wanted to incorporate File Storage Security into our existing pipelines and code with minimal fuss.

Trend hosts a GitHub repository that includes CloudFormation templates that define the resources in their Scanner and Storage stacks. We've standardised on using AWS CDK for our infrastructure deployments and wanted to use CDK to deploy File Storage Security components too.

Instead of re-creating the CloudFormation templates using CDK you can simply import a CloudFormation template in a CDK project using cloudformation-include.CfnInclude.

🔎
Read more about importing templates from the AWS CDK docs

To do this, we copied Trend's Scanner and Storage Stack CloudFormation templates to a templates directory in our project and created two classes: ScannerStack and StorageStack.

The following code shows our ScannerStack class and an example of how we import the CloudFormation template using CDK.

import * as cdk from 'aws-cdk-lib'
import * as cfninc from 'aws-cdk-lib/cloudformation-include'
import type { Construct } from 'constructs'

export interface ScannerStackProps extends cdk.StackProps {
  trendMicroManagementAccount: string
  trendMicroCloudOneRegion: string
  trendMicroExternalId: string
}

export default class ScannerStack extends cdk.Stack {
  public readonly queueUrl: string

  constructor(scope: Construct, id: string, props?: ScannerStackProps) {
    super(scope, id, props)

    const template = new cfninc.CfnInclude(this, 'scanner-stack-template', {
      templateFile: 'templates/FSS-Scanner-Stack.template',
      parameters: {
        ExternalID: props?.trendMicroExternalId,
        TrendMicroManagementAccount: props?.trendMicroManagementAccount,
        CloudOneRegion: props?.trendMicroCloudOneRegion,
      },
    })

    this.queueUrl = template.getOutput('ScannerQueueURL').value
  }
}
ScannerStack class implementation that shows how to import a CloudFormation template
👉
The CloudFormation template includes an Output named ScannerQueueUrl that we need to pass into the StorageStack. This is exposed as a public field, queueUrl.

For the Storage Stack template, we followed the same approach by creating a class called StorageStack and importing the template using cloudformation-include.CfnInclude.

Future enhancement

Deploying the CDK components is one part of the process. The other part is configuring File Storage Security so that it can work with your AWS infrastructure. For our implementation, we manually configured ScannerStackManagementRoleARN and StorageStackManagementRoleARN (as described in the Trend documentation).

Trend Micro publish an API that can be used to automate these manual configuration steps. In a future version we intend to use this to fully automate the deployment process end-to-end.

Working with scan results

Trend's File Storage Security allows you to process scan results through a custom AWS Lambda by subscribing to an AWS SNS Topic.

👉
When deploying the Storage Stack (as described above) the CloudFormation template defines an Output named ScanResultTopicARN that provides a reference to the Topic to subscribe to.

Our solution included deployment of a Lambda stack that subscribes to the topic passed into the stack via a parameter.

import * as cdk from 'aws-cdk-lib'
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as sns from 'aws-cdk-lib/aws-sns'
import * as subscriptions from 'aws-cdk-lib/aws-sns-subscriptions'
import type { Construct } from 'constructs'

export interface MalwareScanLambdaStackProps extends cdk.StackProps {
  lambdaCode: lambda.Code
  scanResultTopic: sns.ITopic
  bucketToScan: string
}

export default class MalwareScanLambdaStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props: MalwareScanLambdaStackProps) {
    super(scope, id, props)

    const malwareScanLambda = new lambda.Function(this, `malware-scan-lambda`, {
      functionName: `${id}-malware-scan-lambda`,
      runtime: lambda.Runtime.NODEJS_16_X,
      code: props.lambdaCode,
      handler: 'index.handler',
    })

    props.scanResultTopic.addSubscription(new subscriptions.LambdaSubscription(malwareScanLambda))
  }
}
Deploying a Lambda that subscribes to an AWS SNS Topic

The Lambda handler you create should be of type aws-lambda.SNSHandler and take a parameter of type aws-lambda.SNSEvent. The following example is a minimal implementation of a Lambda to process a scan result:

import type * as lambda from 'aws-lambda'
import type { SNSHandler } from 'aws-lambda'

interface ScanEvent {
  timestamp: number
  sqs_message_id: string
  bucket: string
  scan_start_timestamp: number
  scanner_status: number
  scanner_status_message: string
  file_name: string
  file_url: string
  scanning_result: {
    TotalBytesOfFile: number
    Findings: ScanEventFinding[]
    Error: string
    Codes: number[]
  }
}

interface ScanEventFinding {
  malware: string
  type: string
}

export const handler: SNSHandler = async (event: lambda.SNSEvent): Promise<void> => {
  for (const eventRecord of event.Records) {
    const scanEvent = JSON.parse(eventRecord.Sns.Message) as ScanEvent

    if (scanEvent?.scanning_result.Findings.length > 0) {
      // Do something interesting
    }
  }
  console.log('Finished processing the event')
}
A minimal Lambda that handles SNS event messages for processing scan result messages
👉
It was difficult to find information that describes the structure of a scan result. To help others, here is the documentation from Trend.

Scanning existing files (not just newly added ones)

The solution we created only scans files that are added to an S3 Bucket. In our project, we were building a new, "green fields" application so this wasn't a concern. But what do you do if you're trying to protect an existing Bucket that already contains files?

Trend documents some approaches on how to do this, including the ability to run scheduled scans over your files.

Testing

How do you go about testing that malware scanning is working correctly? I didn't really want to get my machine infected with anything nor did I want to spread any malware in our company or around our clients.

Fortunately, there's a solution that is perfectly safe. The European Institute for Computer for Anti-Virus Research (EICAR) has developed an anti-virus test file that you can use to safely trigger a "safe positive" (a term I've just made up). You can safely pass this file around and upload it to an S3 Bucket to cause File Storage Security to detect a file with a "virus".

👉
Your operating system or third-party anti-virus software will probably get in the way when you're trying to work with the test EICAR file. You will need to disable virus scanning so that you can upload the EICAR file to your Bucket - just remember to turn it back on when you're done testing!

In our solution, we trigger a notification in Slack. When uploading an instance of the EICAR test file, the following notification is generated:

The image contains the following notification text: Malware detected in uploaded file i-am-virus.txt Malware found: Eicar_test_file (Virus) Scan timestamp: Wed Mar 15 2023 07:06:14 GMT+OOOO (Coordinated Universal Time) Source S3 Bucket: [Redacted]
A screenshot of a Slack notification message displaying details of a malware scan result.

Wrapping up

I found it interesting to put this solution together and it was the first time I'd dived deeper into virus and malware scanning beyond using antivirus software on my machine.

I hope this is useful to you and helps you protect the files in your AWS S3 solutions.