29 posts tagged with "Programming"

Exploring CDK for Terraform for .NET

May 15, 2021 · 4 min read

Los Angeles, CA

Overview

Both AWS CDK and Terraform aim to solve a similar problem: alleviating some of the infrastructure management challenges with code. CDK supports several general-purpose languages, including C#, Python, and TypeScript, while Terraform uses its configuration language called HCL. While CDK can only create AWS resources, Terraform supports virtually every cloud provider, granting the ability to write code to deploy to multiple public clouds at once. Last year, Terraform and AWS announced a project called Terraform for CDK, aiming to grant the best of both worlds (i.e., support for GPLs, multi-cloud, etc.).

Nuances and Limitations

In addition to the programming language features of AWS CDK, there's a construct library with three levels:

L1 (level one) constructs are representations of CloudFormation resources
L2 (level two) constructs provide defaults and boilerplate to simplify the code
Patterns are the highest level and create many resources configured together wrapped in a single construct (e.g., Lambda RESTful API)

While CDK for Terraform utilizes the AWS construct programming model, it does not share the same construct library as CDK. It's important to distinguish that CDK for Terraform stacks only support Terraform providers.

GitHub Repository

You can find a complete working example here.

Installing the Tools and Scaffolding the .NET Solution

The following command line tools are required for getting started:

Terraform (0.12+)
Node.js (12.16+)
AWS CLI (specifically the credentials)

First, install the cdktf CLI:

npm install -g cdktf-cli
# 0.3
cdktf --version

After that, create the .NET project using the cdktf CLI:

mkdir resources
cd resources
# the --local flag refers to local Terraform state management
cdktf init --template=csharp --local

This action creates several files, including a cdktf.json file. Inside this configuration file, specify the AWS provider.

JSONresources/cdktf.json
{
  "language": "csharp",
  "app": "dotnet run -p MyTerraformStack.csproj",
  "terraformProviders": ["aws@~> 2.0"],
  "terraformModules": [],
  "context": {
    "excludeStackIdFromLogicalIds": "true",
    "allowSepCharsInLogicalIds": "true"
  }
}

After adding the provider configuration, generate the provider objects using the following command:

cdktf get

The generated objects are stored in the newly created .gen/ folder. Add this as a reference:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="HashiCorp.Cdktf" Version="0.3.0" />
  </ItemGroup>

  <ItemGroup>
    <ProjectReference Include=".gen\aws\aws.csproj" />
  </ItemGroup>

</Project>

Lastly, initialize the AwsProvider object in the Main.cs file.

.NETresources/Main.cs
using System;
using Constructs;
using HashiCorp.Cdktf;
// AWS provider objects generated by cdktf get command
using aws;

namespace MyCompany.MyApp
{
    class MyApp : TerraformStack
    {
        public MyApp(Construct scope, string id) : base(scope, id)
        {
            // initialize the AWS provider
            // located in the .gen/ folder
            new AwsProvider(this, "aws", new AwsProviderConfig {
                Region = "us-east-1"
            });
        }

        public static void Main(string[] args)
        {
            App app = new App();
            new MyApp(app, "resources");
            app.Synth();
            Console.WriteLine("App synth complete");
        }
    }
}

Adding Resources

As noted above, the resources will be created using the Terraform AWS provider. There are corresponding C# classes for each of the AWS resources specified by the provider. While writing code, the AWS provider documentation in conjunction with your IDE's autocomplete functionality is a powerful way to navigate the available resources. For this example, the code looks up the latest AMI for Ubuntu 20.04 and uses it to create an EC2 Instance. Below the AwsProvider constructor method call in the MyApp constructor method, add a data source and instance like so:

.NETresources/Main.cs
// initialize the AWS provider
// located in the .gen/ folder
new AwsProvider(this, "aws", new AwsProviderConfig {
    Region = "us-east-1"
});

// https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance
DataAwsAmi dataAwsAmi = new DataAwsAmi(this, "aws_ami_ubuntu", new DataAwsAmiConfig()
{
    MostRecent = true,
    Filter = new []
    {
        new DataAwsAmiFilter()
        {
            Name = "name",
            Values = new [] { "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*" }
        },
        new DataAwsAmiFilter()
        {
            Name = "virtualization-type",
            Values = new [] { "hvm" }
        },
    },
    Owners = new [] { "099720109477" }
});

// https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance
Instance ec2Instance = new Instance(this, "aws_ec2_instance", new InstanceConfig()
{
    Ami = dataAwsAmi.ImageId,
    InstanceType = "t3.micro",
});

Note how this functionally behaves the same as the corresponding Terraform HCL with the power of a general-purpose programming language.

Deploying and Managing State

Once finished adding the data source and resource, the project can be built and deployed assuming that the AWS credentials are available (i.e., aws configure has been run).

dotnet build
cdktf deploy
# when ready
cdktf destroy

The state resides in terraform.resources.tfstate.

.NET 5 Docker Lambda Function with API Gateway and Self-Mutating Pipeline Using CDK

February 11, 2021 · 9 min read

Scottie Enriquez

Los Angeles, CA

Deciding on Which Technology to Use

While infrastructure as code (IaC) has existed within the AWS ecosystem since 2011, adoption has exploded recently due to the ability to manage large amounts of infrastructure at scale and standardize design across an organization. There are almost too many options between CloudFormation (CFN), CDK, and Terraform for IaC and Serverless Application Model ( SAM) and Serverless Framework for development. This article from A Cloud Guru quickly sums up the pros and cons of each option. I choose this particular stack for some key reasons:

CDK allows the infrastructure and the CI/CD pipeline to be described as C# instead of YAML, JSON, or HCL
CDK provides the ability to inject more robust logic than intrinsic functions in CloudFormation and more modularity as well while still being a native AWS offering
Docker ensures that the Lambda functions run consistently across local development, builds, and production environments and simplifies dependency management
CDK Pipelines offer a higher level construct with much less configuration than CodePipeline and streamline management of multiple environments

GitHub Repository

You can find a complete working example here.

Initializing the Project

Ensure that .NET 5 and the latest version of CDK are installed. To create a solution skeleton, run these commands in the root directory:

# note that CDK uses this directory name as the solution name
mkdir LambdaApiSolution
cd LambdaApiSolution
cdk init app --language=csharp
# creates a CFN stack called CDKToolkit with an S3 bucket for staging purposes and configures IAM permissions for CI/CD
cdk bootstrap --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess
cdk deploy

In order to use CDK Pipelines later on, a specific flag needs to be added to cdk.json:

JSONLambdaApiSolution/cdk.json
{
  "app": "dotnet run -p src/LambdaApiSolution/LambdaApiSolution.csproj",
  "context": {
    ..
    "@aws-cdk/core:newStyleStackSynthesis": "true",
    ..
  }
}

At the time of writing, the generated CDK template uses .NET Core 3.1. Inside of the .csproj file, change the TargetFramework tag to net5.0.

LambdaApiSolution/src/LambdaApiSolution/LambdaApiSolution.csproj
<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net5.0</TargetFramework>
    </PropertyGroup>
</Project>

From the /LambdaApiSolution directory, run these commands to create the serverless skeleton:

# install the latest version of the .NET Lambda templates
dotnet new -i Amazon.Lambda.Templates
cd src/
# create the function
dotnet new lambda.image.EmptyFunction --name LambdaApiSolution.DockerFunction
# add the projects to the solution file
dotnet sln add LambdaApiSolution.DockerFunction/src/LambdaApiSolution.DockerFunction/LambdaApiSolution.DockerFunction.csproj
dotnet sln add LambdaApiSolution.DockerFunction/test/LambdaApiSolution.DockerFunction.Tests/LambdaApiSolution.DockerFunction.Tests.csproj
# build the solution and run the sample unit test to verify that everything is wired up correctly
dotnet test LambdaApiSolution.sln

Creating the Lambda Infrastructure and Build

First, add the Lambda CDK NuGet package to the CDK project.

<PackageReference Include="Amazon.CDK.AWS.Lambda" Version="1.90.0"/>

Then, create the Docker image and Lambda function using CDK constructs in LambdaApiSolutionStack.cs:

.NETLambdaApiSolution/src/LambdaApiSolution/LambdaApiSolutionStack.cs
public class LambdaApiSolutionStack : Stack
{
    internal LambdaApiSolutionStack(Construct scope, string id, IStackProps props = null) : base(scope, id, props)
    {
        // this path is relative to the directory where CDK commands are run
        // the directory must contain a Dockerfile
        DockerImageCode dockerImageCode = DockerImageCode.FromImageAsset("src/LambdaApiSolution.DockerFunction/src/LambdaApiSolution.DockerFunction");
        DockerImageFunction dockerImageFunction = new DockerImageFunction(this, "LambdaFunction", new DockerImageFunctionProps()
        {
            Code = dockerImageCode,
            Description = ".NET 5 Docker Lambda function"
        });
    }
}

Lastly, update the Dockerfile in the Lambda function project like so to build the C# code:

DockerLambdaApiSolution/src/LambdaApiSolution.DockerFunction/src/LambdaApiSolution.DockerFunction/Dockerfile
FROM public.ecr.aws/lambda/dotnet:5.0
FROM mcr.microsoft.com/dotnet/sdk:5.0 as build-image

ARG FUNCTION_DIR="/build"
ARG CONFIGURATION="release"
ENV PATH="/root/.dotnet/tools:${PATH}"

RUN apt-get update && apt-get -y install zip

RUN mkdir $FUNCTION_DIR
WORKDIR $FUNCTION_DIR
COPY Function.cs LambdaApiSolution.DockerFunction.csproj aws-lambda-tools-defaults.json $FUNCTION_DIR/
RUN dotnet tool install -g Amazon.Lambda.Tools

RUN mkdir -p build_artifacts
RUN if [ "$CONFIGURATION" = "debug" ]; then dotnet lambda package --configuration Debug --package-type zip; else dotnet lambda package --configuration Release --package-type zip; fi
RUN if [ "$CONFIGURATION" = "debug" ]; then cp -r /build/bin/Debug/net5.0/publish/* /build/build_artifacts; else cp -r /build/bin/Release/net5.0/publish/* /build/build_artifacts; fi

FROM public.ecr.aws/lambda/dotnet:5.0

COPY --from=build-image /build/build_artifacts/ /var/task/
CMD ["LambdaApiSolution.DockerFunction::LambdaApiSolution.DockerFunction.Function::FunctionHandler"]

At this point, you can now deploy the changes with the cdk deploy command. The Lambda function can be tested via the AWS Console. The easiest way to do so is to navigate to the CloudFormation stack, click on the function resource, and then create a test event with the string "hello" as the input. Note that this should not be a JSON object because the event handler's parameter currently accepts a single string.

Integrating API Gateway

Add the following packages to the CDK project:

<PackageReference Include="Amazon.CDK.AWS.APIGatewayv2" Version="1.90.0"/>
<PackageReference Include="Amazon.CDK.AWS.APIGatewayv2.Integrations" Version="1.90.0"/>

Next, you can add the API Gateway resources to the stack immediately after the DockerImageFunction in LambdaApiSolutionStack.cs:

.NETLambdaApiSolution/src/LambdaApiSolution/LambdaApiSolutionStack.cs
HttpApi httpApi = new HttpApi(this, "APIGatewayForLambda", new HttpApiProps()
{
    ApiName = "APIGatewayForLambda",
    CreateDefaultStage = true,
    CorsPreflight = new CorsPreflightOptions()
    {
        AllowMethods = new [] { HttpMethod.GET },
        AllowOrigins = new [] { "*" },
        MaxAge = Duration.Days(10)
    }
});

Then, create a Lambda proxy integration and a route for the function:

.NETLambdaApiSolution/src/LambdaApiSolution/LambdaApiSolutionStack.cs
LambdaProxyIntegration lambdaProxyIntegration = new LambdaProxyIntegration(new LambdaProxyIntegrationProps()
{
    Handler = dockerImageFunction,
    PayloadFormatVersion = PayloadFormatVersion.VERSION_2_0
});
httpApi.AddRoutes(new AddRoutesOptions()
{
    Path = "/casing",
    Integration = lambdaProxyIntegration,
    Methods = new [] { HttpMethod.POST }
});

I used /casing for the path since the sample Lambda function returns an upper and lower case version of the input string. Finally, it's helpful to display the endpoint URL using a CFN output for testing.

.NETLambdaApiSolution/src/LambdaApiSolution/LambdaApiSolutionStack.cs
// adding entropy to prevent a name collision
string guid = Guid.NewGuid().ToString();
CfnOutput apiUrl = new CfnOutput(this, "APIGatewayURLOutput", new CfnOutputProps()
{
    ExportName = $"APIGatewayEndpointURL-{guid}",
    Value = httpApi.ApiEndpoint
});

With these changes to the resources, the Lambda function can be invoked by a POST request. The handler method parameters in Function.cs need to be updated for the request body to be passed in.

.NETLambdaApiSolution/src/LambdaApiSolution.DockerFunction/src/LambdaApiSolution.DockerFunction/Function.cs
// replace the string parameter with a proxy request parameter
public Casing FunctionHandler(APIGatewayProxyRequest apiGatewayProxyRequest, ILambdaContext context)
{
    // update the input to use the proxy
    string input = apiGatewayProxyRequest.Body;
    return new Casing(input.ToLower(), input.ToUpper());
}

After successfully deploying the changes, the function can be tested in two ways. The first way is through an HTTP client like Postman. Add a string to the body parameter of the POST request. This action tests the full integration with API Gateway as well as the Lambda function. To test via the Lambda Console, update the test event from before to match the APIGatewayProxyRequest parameter:

{
  "body": "hello"
}

Adding CI/CD Using CDK Pipelines

For this example, the source code resides in GitHub as opposed to CodeCommit. To grant the CI/CD pipeline access to the repository, a personal access token with repo permissions must be created via GitHub and stored in Secrets Manager as a plaintext format object. Note that for this codebase, I've named my secret GitHub-Token.

GitHub Personal Access Token

Next, add the following packages to the CDK project:

<PackageReference Include="Amazon.CDK.AWS.CodeBuild" Version="1.90.0"/>
<PackageReference Include="Amazon.CDK.AWS.CodeDeploy" Version="1.90.0"/>
<PackageReference Include="Amazon.CDK.AWS.CodePipeline" Version="1.90.0"/>
<PackageReference Include="Amazon.CDK.AWS.CodePipeline.Actions" Version="1.90.0"/>
<PackageReference Include="Amazon.CDK.Pipelines" Version="1.90.0"/>

With these dependencies loaded, create a class called PipelineStack.cs. The following code creates a self-mutating CDK Pipeline, adds a GitHub source action to fetch the code using the token from Secrets Manager, and synthesizes the solution's CDK:

.NETLambdaApiSolution/src/LambdaApiSolution/PipelineStack.cs
using Amazon.CDK;
using Amazon.CDK.AWS.CodeBuild;
using Amazon.CDK.AWS.CodePipeline;
using Amazon.CDK.AWS.CodePipeline.Actions;
using Amazon.CDK.Pipelines;

namespace LambdaApiSolution
{
    public class PipelineStack : Stack
    {
        internal PipelineStack(Construct scope, string id, IStackProps props = null) : base(scope, id, props)
        {
            Artifact_ sourceArtifact = new Artifact_();
            Artifact_ cloudAssemblyArtifact = new Artifact_();
            CdkPipeline pipeline = new CdkPipeline(this, "LambdaApiSolutionPipeline", new CdkPipelineProps()
            {
                CloudAssemblyArtifact = cloudAssemblyArtifact,
                PipelineName = "LambdaApiSolutionPipeline",
                SourceAction = new GitHubSourceAction(new GitHubSourceActionProps()
                {
                    ActionName = "GitHubSource",
                    Output = sourceArtifact,
                    OauthToken = SecretValue.SecretsManager(Constants.GitHubTokenSecretsManagerId),
                    // these values are in Constants.cs instead of being hardcoded
                    Owner = Constants.Owner,
                    Repo = Constants.RepositoryName,
                    Branch = Constants.Branch,
                    Trigger = GitHubTrigger.POLL
                }),
                SynthAction = new SimpleSynthAction(new SimpleSynthActionProps()
                {
                    Environment = new BuildEnvironment
                    {
                        // required for .NET 5
                        // https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-available.html
                        BuildImage = LinuxBuildImage.STANDARD_5_0
                    },
                    SourceArtifact = sourceArtifact,
                    CloudAssemblyArtifact = cloudAssemblyArtifact,
                    // navigates to the proper subdirectory to simplify other commands
                    Subdirectory = "LambdaApiSolution",
                    InstallCommands = new [] { "npm install -g aws-cdk" },
                    BuildCommands = new [] { "dotnet build src/LambdaApiSolution.sln" },
                    SynthCommand = "cdk synth"
                })
            });
        }
    }
}

Remove the following line from Program.cs since the pipeline will deploy the API from now on:

new LambdaApiSolutionStack(app, "LambdaApiSolutionStack");

Delete the previous stack, commit the latest changes to the source code so that they'll be available when the pipeline fetches the repo, and finally deploy the pipeline:

cdk destroy
git add .
git commit -m "Adding source code to GitHub repository"
git push origin main
cdk deploy LambdaApiSolutionPipelineStack

Creating Multiple Environments

From now on, the pipeline will manage changes instead of manual cdk deploy commands. By merely pushing changes to the main branch, the pipeline will update itself and the other resources. The last feature in this example is adding development, test, and production environments. Rather than creating more stacks, we can leverage stages instead. Each environment will have a stage that makes a separate stack plus actions like approvals or integration testing. First, a stage must be defined in code. For this example, a stage will only contain an API stack.

.NETLambdaApiSolution/src/LambdaApiSolution/Program.cs
using Amazon.CDK;
using Construct = Constructs.Construct;

namespace LambdaApiSolution
{
    public class SolutionStage : Stage
    {
        public SolutionStage(Construct scope, string id, IStageProps props = null) : base(scope, id, props)
        {
            LambdaApiSolutionStack lambdaApiSolutionStack = new LambdaApiSolutionStack(this, "Solution");
        }
    }
}

To implement the stages, navigate back to PipelineStack.cs and append the following code after the pipeline declaration:

.NETLambdaApiSolution/src/LambdaApiSolution/PipelineStack.cs
CdkStage developmentStage = pipeline.AddApplicationStage(new SolutionStage(this, "Development"));
CdkStage testStage = pipeline.AddApplicationStage(new SolutionStage(this, "Test"));
testStage.AddManualApprovalAction(new AddManualApprovalOptions()
{
    ActionName = "PromoteToProduction"
});
CdkStage productionStage = pipeline.AddApplicationStage(new SolutionStage(this, "Production"));

Next Steps

The Lambda function, API Gateway, and multi-environment CI/CD pipeline are now in place. More Lambda functions can be added as separate C# projects. More stacks can be created and added to SolutionStage.cs.

Cloud9 IDE Configuration

November 8, 2020 · 3 min read

Scottie Enriquez

Los Angeles, CA

Cloud9 Overview and Use Cases

Per AWS, "Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Cloud9 comes prepackaged with essential tools like Docker and support for popular programming languages, including JavaScript, Python, PHP, and .NET." The AWS, Serverless Application Model (SAM), and Cloud Development Kit (CDK) CLIs are pre-installed as well. Users are abstracted from credential management (i.e., there's no need to provision an IAM user and run aws configure). Since the underlying compute is an EC2 instance, developers have a consistent experience across client devices.

Cloud9

Cloud9 makes it easy to declare an environment using CloudFormation, specify Git repositories to clone during the provisioning process, and share various custom settings such as themes and keybindings with developers. It's also a cheap option since the EC2 instance shuts itself down after a set period of time (with a default of 30 minutes).

Initial Setup

The first deployment fails unless a Cloud9 environment has been created from the Console due to an IAM service role created in the process (service-role/AWSCloud9SSMAccessRole). See more information in the AWS documentation.

AWS Resources Created

A Cloud9 environment with an m5.large instance EC2 instance
A CodeCommit repository for stashing work since the Cloud9 environment is considered transient

CloudFormation Template

Resources:
  rCloud9Environment:
    Type: AWS::Cloud9::EnvironmentEC2
    Properties:
      AutomaticStopTimeMinutes: 30
      ConnectionType: CONNECT_SSM
      Description: Web-based cloud development environment
      InstanceType: m5.large
      Name: Cloud9Environment
      Repositories:
        - PathComponent: /repos/codecommit
          RepositoryUrl: !GetAtt rCloud9WorkingRepository.CloneUrlHttp
        - PathComponent: /repos/aws-cloud9-environment
          RepositoryUrl: https://github.com/scottenriquez/aws-cloud9-environment.git
  rCloud9WorkingRepository:
    Type: AWS::CodeCommit::Repository
    Properties:
      RepositoryName: Cloud9WorkingRepository
      RepositoryDescription: A CodeCommit repository for stashing work from the Cloud9 IDE

This template can be deployed via the AWS Console or the AWS CLI.

Initialization Script

wget https://github.com/dotnet/core/files/2186067/openssl-libs-ami.spec.txt
rpmbuild --bb openssl-libs-ami.spec.txt
sudo rpm -i /usr/src/rpm/RPMS/x86_64/openssl-libs-1.0.0-0.x86_64.rpm
sudo rpm -Uvh https://packages.microsoft.com/config/centos/7/packages-microsoft-prod.rpm
sudo yum install dotnet-sdk-3.1 zsh
sudo passwd ec2-user
chsh -s /bin/zsh
sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

The user-data.sh script is intended to run when the Cloud9 instance spins up (mirroring the EC2 instance paramater of the same name). Unfortunately, this setup must be done manually since there isn't a parameter in the CloudFormation resource. To make this easier, I've added this GitHub repository to the list to clone on the instance.

The Bash script does the following:

Installs .NET Core 3.1
Installs Zsh and Oh My Zsh
Resets the ec2-user password
Changes the default shell to Zsh

User Settings

{
  "ace": {
    "@behavioursEnabled": true,
    "@fontSize": 18,
    "@keyboardmode": "vim",
    "@showGutter": true,
    "@showInvisibles": true,
    "@theme": "@c9/ide/plugins/c9.ide.theme.jett/ace.themes/jett",
    "custom-types": {
      "json()": {
        "settings": "javascript"
      }
    },
    "statusbar": {
      "@show": true
    }
  },
  "build": {
    "@autobuild": false
  },
  "collab": {
    "@timeslider-visible": false
  },
  "configurations": {},
  "debug": {
    "@autoshow": true,
    "@pause": 0
  },
  "general": {
    "@autosave": "afterDelay",
    "@skin": "jett-dark"
  },
  "key-bindings": {
    "@platform": "auto",
    "json()": []
  },
  "openfiles": {
    "@show": false
  },
  "output": {},
  "projecttree": {
    "@showhidden": false
  },
  "tabs": {
    "@show": true
  },
  "terminal": {
    "@fontsize": 18
  }
}

Much like the user-data, the user settings aren't parameterized in CloudFormation. These settings are included in the repository but must be manually configured.

Using Former2 for Existing AWS Resources

September 26, 2020 · 5 min read

Scottie Enriquez

Los Angeles, CA

Overview

I've been making a concerted effort lately to use infrastructure as code via CloudFormation for all of my personal AWS-hosted projects. Writing these templates can feel a bit tedious, even with editor tooling and plugins. I thought it would be awesome to generate CloudFormation templates for existing resources and first found CloudFormer. I found blog posts about CloudFormer from as far back as 2013, but it was never advertised much.

Update: Former2 is the de facto standard now that CloudFormer has been deprecated. I kept my notes on CloudFormer for posterity at the end of the post.

Getting Started with Former2

Former2 takes a client-side approach to infrastructure as code template generation and has support for Terraform and CDK. Instead of an EC2 instance, it uses the JavaScript SDKs via your browser to make all requisite API calls. You can even use the static website hosted on the public internet. If you're not keen on the idea of passing read-only IAM credentials to a third-party website, clone the repository and run the web application locally via the file system or Docker. I've also created a CloudFormation template to run it on an EC2 instance:

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  pAllowedIpCidr:
    Type: String
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
    Default: '0.0.0.0/0'
  pLatestAl2AmiId:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
  pVpcId:
    Type: AWS::EC2::VPC::Id
  pSubnetId:
    Type: AWS::EC2::Subnet::Id
  pKeyPairName:
    Type: AWS::EC2::KeyPair::KeyName
Description: A self-hosted instance of Former2 on EC2
Resources:
  rEc2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Former2 security group
      GroupName: Former2
      VpcId: !Ref pVpcId
      SecurityGroupIngress:
        - CidrIp: !Ref pAllowedIpCidr
          IpProtocol: tcp
          FromPort: 80
          ToPort: 443
      SecurityGroupEgress:
        - CidrIp: !Ref pAllowedIpCidr
          IpProtocol: tcp
          FromPort: 80
          ToPort: 443
  rEc2Instance:
    Type: AWS::EC2::Instance
    Properties:
      UserData:
        Fn::Base64: |
          #!/bin/bash
          yum update -y
          yum install git httpd -y
          systemctl start httpd
          systemctl enable httpd
          cd /var/www/html
          git clone https://github.com/iann0036/former2.git .
      ImageId: !Ref pLatestAl2AmiId
      InstanceType: t2.micro
      KeyName: !Ref pKeyPairName
      Tenancy: default
      SubnetId: !Ref pSubnetId
      EbsOptimized: false
      SecurityGroupIds:
        - !Ref rEc2SecurityGroup
      SourceDestCheck: true
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            Encrypted: false
            VolumeSize: 8
            VolumeType: gp2
            DeleteOnTermination: true
      HibernationOptions:
        Configured: false
Outputs:
  PublicIp:
    Description: Former2 EC2 instance public IP address
    Value: !GetAtt rEc2Instance.PublicIp
    Export:
      Name: !Sub '${AWS::StackName}-PublicIp'

Use Cases for Generating Templates

Overall I’d argue that addressing the minor changes is easier than writing a template from scratch. With that being said, I don’t know that I’d ever spin up resources via the Console with the sole intent of creating CloudFormation templates. However, it could make migrating from a prototype to a productionized product easier if you’re willing to pay a small compute cost.

Getting Started with CloudFormer (Deprecated)

Setting up CloudFormer is quite simple through CloudFormation. In fact, it's a sample template that creates a stack with several resources:

AWS::EC2::Instance
AWS::EC2::SecurityGroup
AWS::IAM::InstanceProfile
AWS::IAM::Role
AWS::IAM::Policy

The template has a few parameters as well:

Username
Password
VPC

After creating the stack like any other CloudFormation template, a URL is outputted. The t2.small EC2 instance is a web server with a public IPv4 address and DNS configured behind the scenes. The security group allows all traffic (0.0.0.0/0) on port 443, but it's worth noting that I did have an SSL issue with my instance that threw a warning in my browser. The instance profile is used by the web server to assume the IAM role with an attached policy that allows for widespread reads across resources and writes to S3. Keep in mind that the CloudFormer stack should be deleted after to use to avoid unnecessary compute charges for the EC2 instance.

Using the CloudFormer Web Server

Navigate to the URL from the output tab of the CloudFormation stack (something like https://ec2-0-0-0-0.compute-1.amazonaws.com) and enter the username and password that you specified as a parameter. Via the GUI, select the resources to reverse engineer across the following categories:

DNS
VPC
VPC Network
VPC Security
Network
Managed
Services
Managed Config
Compute
Storage
Storage Config
App Services
Security
Operational

The list is robust but not all-inclusive.

Creating a Template for a CloudFront Distribution

I have a public CDN in one of my AWS accounts for images on a JAMstack site hosted on Netlify. It uses a standard design: a private S3 bucket behind a CloudFront distribution with an Origin Access Identity. Through the CloudFormer workflow, I selected the individual components:

CloudFront distribution
S3 bucket
Bucket policy

Sadly, there's no support for YAML as of right now. The web server generated a JSON template, which I converted to YAML via the Designer.

I plugged the template back into CloudFormation, and everything provisioned successfully. Digging deeper into the template, I noticed a few minor changes to make. First of all, the logical names are based on specifics of the existing resources (e.g., distd1yqxti3jheii7cloudfrontnet came from the URL of the CDN). However, these can easily be refactored. Since CloudFormer doesn't support creating an OAI, the existing identity is hardcoded. I added a resource for that to the template and converted the hardcoded value to a reference.

Fantasy Football Power Rankings Markdown Generator

August 21, 2020 · 3 min read

Scottie Enriquez

Los Angeles, CA

Overview

spr is a CLI tool for generating Markdown pages in the league Gatsby site for power rankings posts with Sleeper data injected. The source code is hosted on GitHub.

Build Status

Installation and Dependencies

Clone the repository, install NPM dependencies, and create a symlink in the global folder.

git clone git@github.com:the-winner-is-a-twiath/power-rankings-markdown-cli.git
cd power-rankings-markdown-cli
npm install
npm link
spr --version

The CLI source code doesn't store any secrets, so ensure that the AWS CLI is installed and that the credentials are configured at ~/.aws/credentials.

[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

If not, run aws configure.

Usage

Navigate to the root folder of the league's Gatsby site, and run spr add <WEEK_NUMBER> <AUTHOR_FIRST_NAME>. The alias for add is a. Open the generated index.md file in the newly created directory (<FIRST_NAME>-week-<WEEK_NUMBER>-power-rankings/) to enter the power rankings text for the new post.

Functionality

Validates the week number and author first name
Checks the current Git branch to ensure that the user has created a non-main branch
Verifies that the present working directory contains a Gatsby configuration file to standardize the relative paths
Fetches the league members and rosters from the Sleeper API
Fetches the current avatar for each league member and copies to a CDN hosted in AWS
Generates Markdown power rankings with the member's latest stats neatly formatted
Creates a new directory for the post in the Gatsby website and writes the index.md file

Configuration

The league-specific details exist in various JavaScript configuration files to maximize reusability. While the CLI is tightly-coupled with Gatsby, there's still much that can be reconfigured for other leagues.

JavaScriptlib/config/gatsby.js
const gatsby = {
  // used to determine if the user created a new branch
  mainBranchName: 'master',
  // used to determine if the user is in the root Gatsby directory
  configFileName: 'gatsby-config.js',
  // used to support any changes to the default blog path for vanity URLs
  postPath: '/content/blog/posts',
  // used to defer image styling for the avatar to the Gatsby site
  avatarHTMLClass: 'sleeper-avatar',
};

JavaScriptlib/config/aws.js
const awsConfig = {
  // S3 bucket
  bucketName: 'twiath-site-cdn',
  // URL base to be used for source in <img> tag
  cdnURLBase: 'https://d1yqxti3jheii7.cloudfront.net',
};

JavaScriptlib/config/league.js
const league = {
  // Sleeper league ID number
  id: '541384381865115648',
};

JavaScriptlib/config/validAuthors.js
const authors = {
  Scottie: 'Scottie Enriquez',
  Callen: 'Callen Trail',
  Logan: 'Logan Richardson',
  Carl: 'Carl Meziere',
  Andrew: 'Andrew Carlough',
  John: 'John Yarrow',
  Matt: 'Matt Kniowski',
  Chris: 'Chris Ramsey',
  Caleb: 'Caleb Trantow',
  Travis: 'Travis James',
  Trond: 'Trond Liu',
  Mark: 'Mark Hamilton',
};

JavaScriptlib/config/validWeeks.js
const weeks = {
'zero',
'one',
'two',
'three',
'four',
'five',
'six',
'seven',
'eight',
'nine',
'ten',
'eleven',
'twelve',
'thirteen',
'fourteen',
'fifteen',
'sixteen',
'seventeen',
};

Continuous Integration for Swift Packages in Azure DevOps

August 4, 2020 · 4 min read

Scottie Enriquez

Los Angeles, CA

Overview

I use Azure DevOps for building CI/CD pipelines for professional projects. Given the GitHub integration and broader feature set, I've started using it instead of Travis CI for my open-source projects too. For most technology areas, there's a wide set of pre-built tasks that can be leveraged to build robust pipelines quickly. There are several tasks for compiling and publishing iOS applications using Xcode on transient macOS virtual machines.

However, in the spirit of using Swift like a general-purpose language, I wanted to use a Linux build server, a more industry-standard approach for CI/CD. In my previous post, I described how I set up a Swift executable package to be more testable. This pipeline provides continuous integration for it. Azure Pipelines, which powers CI/CD in Azure DevOps, is scripted in YAML. It also supports integrating shell commands to be run on the virtual machine.

Adding a Trigger

The first thing to specify in the YAML is a trigger. The trigger denotes which branches in the Git repository the build should be run for. For example, to run the build only for master, the trigger would be as follows:

trigger:
  - master

In general, I want CI to run on all branches, so I use the following YAML instead:

trigger:
  branches:
    include:
      - '*'

Specifying a Virtual Machine Image

After specifying the trigger, Azure Pipelines needs to know what type of infrastructure to run the build on. At the time of writing, 5.2 is the latest stable version of Swift. Swift is not currently available in APT, Ubuntu's package manager. The binaries from the Swift download page target a specific LTS version of Ubuntu. The most recent version listed is 18.04, even though 20.04 released in April. Because of these specific requirements, I opted to target a specific version of Ubuntu in my YAML instead of ubuntu-latest. ubuntu-latest will be updated to 20.04 at some point, but this is outside my control.

pool:
  vmImage: 'ubuntu-18.04'

Installing Swift Programmatically

With a product like Azure Pipelines that utilizes transient virtual machines, the customer pays for the server time. In short, the longer your builds take, the more expensive they are. Because of this and performance reasons, it doesn't make sense to compile Swift from source each time the build runs (i.e., when a developer commits). The best practice is to fetch dependencies via the distro's package manager for easier versioning and simple installation. With that not being an option for Swift on Ubuntu, the next best option is to fetch the binaries.

Azure Pipelines supports steps, which are logical sections of the build for human readability and organization. At a high level, the process is to:

Install dependencies for running Swift that aren't shipped with Ubuntu
Make a working directory
Fetch the Swift binaries
Unzip the binaries
Add the binaries to the PATH so that swift can be used as a shell command
Echo the version to ensure that it's working properly

In the pipeline script, the steps above are written as Bash commands and wrapped in a script YAML statement.

steps:
  - script: |
      sudo apt-get install clang libicu-dev
      mkdir swift-install
      cd swift-install
      wget https://swift.org/builds/swift-5.2.4-release/ubuntu1804/swift-5.2.4-RELEASE/swift-5.2.4-RELEASE-ubuntu18.04.tar.gz
      tar -xvzf swift-5.2.4-RELEASE*
      export PATH=$PATH:$(pwd)/swift-5.2.4-RELEASE-ubuntu18.04
      swift -version
    displayName: 'Install Swift'

Additional Steps

With Swift successfully installed, the remainder of the build steps is scripted in additional steps. This commonly entails compiling, running unit tests, and static code analysis. For the sake of a simple executable package, this could be merely running swift test like below. Putting it all together, this YAML script is a solid base for many Swift package CI jobs.

trigger:
  branches:
    include:
      - '*'

pool:
  vmImage: 'ubuntu-18.04'

steps:
  - script: |
      sudo apt-get install clang libicu-dev
      mkdir swift-install
      cd swift-install
      wget https://swift.org/builds/swift-5.2.4-release/ubuntu1804/swift-5.2.4-RELEASE/swift-5.2.4-RELEASE-ubuntu18.04.tar.gz
      tar -xvzf swift-5.2.4-RELEASE*
      export PATH=$PATH:$(pwd)/swift-5.2.4-RELEASE-ubuntu18.04
      swift -version
    displayName: 'Install Swift'

  - script: |
      swift test
    displayName: 'Run unit tests'

Creating a Swift 5.2 Executable with Unit Tests

August 3, 2020 · 4 min read

Scottie Enriquez

Los Angeles, CA

Background

To better learn Swift, I've been trying to use it as a truly general-purpose programming language instead of purely iOS development. I'm currently building an iOS app that requires multiple versions of the same vector graphics (SVGs). I couldn't find an open-source solution for my needs, so I decided to start scripting. Typically, I would have used Python or Node.js, but I powered through with Swift in the spirit of immersion.

Getting the initial project structure and unit tests set up took some research, so this quick guide will outline how I've been structuring my codebases for executable packages. Outside of iOS development, Swift's documentation isn't as robust as Python or Node.js, given the age difference. This blog post's objective is to merge a lot of useful knowledge I found across forums.

Creating the Project

Use the Swift CLI to create an executable project with this command: swift package init --type executable. It's important to note that the names will be created based on the current directory. If you want to use a name for your project other than the root directory, create a new folder and run the command there.

mkdir AlternatePackageName
cd AlternatePackageName
swift package init --type executable

To open in Xcode, run open Package.swift. Swift has created a project with the following structure:

├── Package.swift
├── README.md
├── Sources
  └── SwiftPackageExecutable
      └── main.swift
└── Tests
    ├── LinuxMain.swift
    └── SwiftPackageExecutableTests
        ├── SwiftPackageExecutableTests.swift
        └── XCTestManifests.swift

Creating a Library

Executable modules are not testable. The implication is that functions cannot be tested inside /Sources/SwiftPackageExecutable (in the same directory as main.swift). Doing so will throw an unhelpful compiler error. The alternative is to move the logic to a library module. This requires a change to the directory structure and default Package.swift.

// swift-tools-version:5.2

import PackageDescription

let package = Package(
    name: "SwiftPackageExecutable",
    dependencies: [],
    targets: [
        .target(
            name: "SwiftPackageExecutable",
            dependencies: []),
        .testTarget(
            name: "SwiftPackageExecutableTests",
            dependencies: ["SwiftPackageExecutable"]),
    ]
)

First, set the products variable in between the name and dependencies. Create .executable and .library entries like so:

name: "SwiftPackageExecutable",
products: [
    .executable(name: "SwiftPackageExecutable", targets: ["SwiftPackageExecutable"]),
    .library(name: "SwiftPackageLibrary", targets: ["SwiftPackageLibrary"]),
],
dependencies: [],

Next, in the array of targets, add another .target for the library, and update the dependencies. The executable and test modules should depend on the library.

.target(
    name: "SwiftPackageExecutable",
    dependencies: ["SwiftPackageLibrary"]),
.target(
    name: "SwiftPackageLibrary",
    dependencies: []),
.testTarget(
    name: "SwiftPackageExecutableTests",
    dependencies: ["SwiftPackageLibrary"]),

The completed Package.swift is as follows:

// swift-tools-version:5.2

import PackageDescription

let package = Package(
    name: "SwiftPackageExecutable",
    products: [
        .executable(name: "SwiftPackageExecutable", targets: ["SwiftPackageExecutable"]),
        .library(name: "SwiftPackageLibrary", targets: ["SwiftPackageLibrary"]),
    ],
    dependencies: [],
    targets: [
        .target(
            name: "SwiftPackageExecutable",
            dependencies: ["SwiftPackageLibrary"]),
        .target(
            name: "SwiftPackageLibrary",
            dependencies: []),
        .testTarget(
            name: "SwiftPackageExecutableTests",
            dependencies: ["SwiftPackageLibrary"]),
    ]
)

Lastly, create a new directory inside of /Sources/ for the new library.

Creating Logic and Unit Tests

For a simple example, add some easily testable logic like addition. The Swift file should reside at /Sources/SwiftPackageLibrary/Add.swift.

import Foundation

public struct Add {
    public static func integers(_ first: Int, to second: Int) -> Int {
        return first + second
    }
}

Inside of the test module, add a standard test for the library module function.

import XCTest
@testable import SwiftPackageLibrary

final class AddTests: XCTestCase {
    func shouldAddTwoIntegersForStandardInput() throws {
        // Arrange
        let first = 1
        let second = 2
        let expectedSum = 3

        // Act
        let actualSum = Add.integers(first, to: second)

        // Assert
        XCTAssertEqual(actualSum, expectedSum)
    }

    static var allTests = [
        ("shouldAddTwoIntegersForStandardInput", shouldAddTwoIntegersForStandardInput),
    ]
}

Lastly, update XCTestsManifest.

import XCTest

#if !canImport(ObjectiveC)
public func allTests() -> [XCTestCaseEntry] {
    return [
        testCase(AddTests.allTests)
    ]
}
#endif

Putting It All Together

With all this in place, you can now unit test your library logic and expose it as an executable in the main.swift file.

├── Package.swift
├── README.md
├── Sources
    ├── SwiftPackageExecutable
        └── main.swift
    └── SwiftPackageLibrary
        └── Add.swift
└── Tests
    ├── LinuxMain.swift
    └── SwiftPackageExecutableTests
        ├── AddTests.swift
        └── XCTestManifests.swift

To run the executable, use swift run. To run the unit tests, use swift test.

Using Repl.it in a High School Classroom

July 30, 2020 · 5 min read

Scottie Enriquez

Los Angeles, CA

Background

I co-teach an advanced placement computer science class at Heights High School in Houston with the TEALS program. The curriculum utilizes the Java programming language and has an emphasis on object-oriented programming. From a school IT system perspective, we have limited options for the software we can distribute to the students' laptops. During my first year of volunteering, we opted to use BlueJ for the first couple of months before exposing them to Eclipse, a more heavy-duty integrated development environment. Both editors have their challenges and limitations for novices, so we began to evaluate several other options, including Visual Studio Code. After considering numerous factors, including the complexity of installation, editor features, and accessibility, we opted to try a radically different option: Repl.it.

Benefits and Implications

Repl.it is a feature-rich, browser-based IDE with support for numerous programming languages, including Java. In addition to the editor and computing environment, the application supports classroom features such as creating assignments that I will detail further below. Since Repl.it runs in the browser, there's no installation or configuration in terms of editors, runtimes, etc. Using a browser-based tool decreased the number of local support issues that we had to address. We found that students had much fewer problems with getting acclimated to the tooling compared to BlueJ and Eclipse. The user interface proved to be intuitive. There were relatively few issues with the underlying runtimes and virtualization that Repl.it abstracts from the user.

Repl.it IDE

Repl.it requires an internet connection, and teachers shouldn't assume that students have internet access at home. Though many classes will be online due to the COVID-19 global pandemic, keep in mind that students may have limited connectivity. I recommend offering desktop IDEs as an offline alternative so that students can at least run code locally.

Setting Up a Classroom

Repl.it is free for schools. There's an excellent video overview of the features on YouTube. Last year, we used Repl.it Classroom for assigning coding homework. We use other software like Practice-It for some assignments but struggled to find a way to evaluate raw source code. Repl.it simplified grading because we didn't have to pull down students' source code and build on our local machines.

Integrating with GitHub

While Repl.it is excellent for running code and submitting assignments, it doesn't offer built-in source control. Teachers create classrooms on a per year basis, so sharing examples and references across classes isn't transparent. Each environment targets an individual student exercise, so collaboration isn't seamless either.

GitHub offers a public place to store code and implement software development workflows like pull requests and CI/CD. At Heights High School, we've hosted solutions here for students and any other teachers who want to use the code in their classrooms. The source code for this project resides in a public repository as well. Repl.it has native GitHub integration so that a public repository can be imported when creating a new Repl. The Repl syncs with the GitHub repository so that when a developer pushes changes to the remote origin, the updates propagate to Repl.it.

Creating a Template

With GitHub, a team can create a template project to be used when a new repository is created. Templates allow developers to have codebase structure (i.e., putting source code in /src) and configuration files injected into every child repository. Under the repository settings page, check the template repository flag. After this, when creating a new repository, the template should appear as an option for the base.

Template Project for Repl.it

This GitHub project contains a template repository.

Main File

In terms of source code, only the "Hello, World!" program is included:

OpenJDKMain.java
class Main {
    public static void main(String[] args) {
        System.out.println("Hello, world!");
    }
}

Given that most assignments fit into a single file, I haven't injected any opinions on the file structure.

EditorConfig

I've included an EditorConfig file in this project so that the code styling remains consistent across multiple codebases. EditorConfig is IDE-agnostic with plugins or native integration across IntelliJ, Visual Studio Code, Eclipse, etc.

EditorConfig.editorconfig
root = true

[*]
indent_style = tab
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true

Repl.it Configuration File

Repl.it supports a configuration file that allows a developer to specify which language and run command to use. Without going into details about the recent changes to Oracle's Java license structure, I'll note that this project uses OpenJDK 10, which is free and open-source. The run variable in the configuration file refers to a shell command to compile and execute the program. Bash on the underlying Linux virtual machine interprets the command, so it isn't specific to Repl.it. The run command can be tested on a local computer or by modifying the configuration file directly in Repl.it.

GNU Bash.replit
language = "java10"
run = "javac Main.java && java Main"

Fork the Project

Creating College Football Recruiting Database on AWS Athena

June 9, 2020 · 6 min read

Scottie Enriquez

Los Angeles, CA

About College Football Recruiting

Despite being played by amateur student-athletes, college football has become a multi-billion dollar industry. Most likely due to the emotional connection to an academic institution and the incredibly entertaining and volatile lack of parity and consistency amongst teams, college football fans tend to be even more diehard than their NFL counterparts, particularly in the South. Though college football is played by undergraduate and graduate students, players are scouted as recruits as early as middle school. These recruits are evaluated based on several factors that indicate their success at both the collegiate and professional levels of football. Whether physical attributes like height and weight or skill sets like blocking and catching, all of these attributes plus countless others are synthesized into a rating. Recruits are then offered by universities culminating in commitments and signings. A good recruiting class can be an indication of future success for a college football team provided that the coaching staff develops talents as expected.

Source Code

This repository is a set of Python and shell scripts to fetch and process publicly available data from 247 for non-commercial, personal data analysis use to be done using AWS Athena. It's co-authored by Callen Trail. As is the nature of web scrapers, the HTML parsing code is brittle. If the page structure changes on the site, the scripts will need to be modified. The process is split into three stages.

Stage One: Fetching Recruit Lists by Year

Players are scraped from the recruiting index page in the following format:

{
  "247_id": "46038819",
  "247_url": "https://247sports.com/Player/Bryan-Bresee-46038819",
  "full_name": "Bryan Bresee",
  "year": 2020,
  "position": "DT",
  "high_school": "Damascus",
  "city": "Damascus",
  "state": "MD",
  "score": "0.9995",
  "stars": 5,
  "height_feet": 6.0,
  "height_inches": 5.0,
  "weight": 290.0
}

All Python dependencies are located in requirements.txt. To run, simple execute the command python scrape_recruit_list.py <START_YEAR> <END_YEAR>. This range dictates the volume of data captured by core logic of the script like so:

recruits_per_page = 50
year_range = range(int(sys.argv[1]), int(sys.argv[2]))
recruit_list_path = './recruit-lists'
for year in year_range:
    recruit_list = []
    number_of_pages_for_year = get_number_of_pages_for_year(year, recruits_per_page)
    for page_index in range(1, number_of_pages_for_year + 1):
        url = f'https://247sports.com/Season/{year}-Football/CompositeRecruitRankings/?page={page_index}'
        print(emoji.emojize(f':rocket: Fetching: {url}'))
        parse_page_of_recruits(url, recruit_list, year)
    file_name = f'{recruit_list_path}/recruit-list-{year}.json'
    with open(file_name, 'w') as output_file:
        json.dump(recruit_list, output_file)
    print(emoji.emojize(f':file_folder: Wrote {year} recruits to {file_name}'))

The script will generate a file for each year (i.e. recruit-list-2020.json) in the /recruit-lists directory. The files in this directory are treated as build output and ignored via the .gitignore. There's also some basic exception handling to insert default values for inconsistent or missing data.

Stage Two: Obtaining Ranking History and Recruiting Timeline Events

With a set of lists generated by stage one, the process_recruits.py script fetches and parses the complete ranking history and timeline of events (i.e. official visits, offers, etc.). To run, pass a recruiting list from stage one and the corresponding year to produce the files: python process_recruits.py 2020 recruit-lists/recruit-list-2020.json.

Recruit ranking histories are stored in the following path: /recruit-ranking-histories/{year}/{247_id}.json. For example, Bryan Bresee's path would be /recruit-ranking-histories/2020/46038819.json in the following format:

{
  "247_id": "46038819",
  "rating": 0.9995,
  "rank": 1,
  "change_date": "2020-01-31",
  "delta": -0.0002,
  "delta_inception": 0.0295
}

Recruiting timeline events are stored in the following path: /recruit-timeline-histories/{year}/{247_id}.json. For example, Bryan Bresee's path would be /recruit-timeline-histories/2020/46038819.json in the following format:

{
  "247_id": "46038819",
  "event_date": "2020-01-08",
  "event_type": "Enrollment",
  "event_description": "Bryan Bresee enrolls at Clemson Tigers",
  "school": "Clemson"
}

Given the large amount of data to process during stage two, this repository also includes a bootstrapping shell script for EC2 instances to install the Python tooling, configure the virtual environment, and pull the data from stage one via S3. Pass the following to the user data field when provisioning a new EC2 instance:

#!/bin/bash
sudo yum install git -y
sudo yum install python3 -y
git clone https://github.com/scottenriquez/247-recruiting-ranking-history-scraper.git
cd 247-recruiting-ranking-history-scraper
mkdir recruit-lists
mkdir recruit-ranking-histories
mkdir recruit-timeline-histories
aws s3 cp s3://247-recruit-rankings-2010-2020/recruit-list/ recruit-lists --recursive
python3 -m venv env
source env/bin/activate
sudo pip3 install -r requirements.txt

Note that since S3 bucket names are globally unique, this will need to be changed for any other bucket. An IAM role with access to the target bucket must be attached to the EC2 instances as well.

Stage Three: Cleanup, Normalization, and Optimization

After the first two stages, there are three output directories containing data:

/recruit-lists now contains one file per year containing all recruits from that year.

/recruit-ranking-histories now contains subdirectories for each year storing an individual JSON file per recruit capturing ranking changes.

/recruit-timeline-histories now contains subdirectories for each year storing an individual JSON file per recruit capturing events like official visits.

There are also several utility scripts to apply additional transformations. The first of these is merge_utility.py which merges all recruit files in each of the top-level year directories into a single file. This can be easier to manage than handling the thousands of files generated by stages one and two depending on the use case. Specifically, it is more performant for Athena which prefers larger files (~100MBs is the sweet spot according to the AWS documentation) as opposed to a higher volume of files. To run, use the command python merge_utility.py <PATH_TO_DIRECTORY_WITH_FILES_TO_MERGE> <PATH_TO_OUTPUT_FILE>.

Both the recruiting history and event timeline scraping produced numerous duplicates. These data structure don't have a unique identifier (i.e. 247_id). duplicate_composite_utility.py creates a composite key by concatenating all fields together to detect duplicates and deletes accordingly:

def build_composite_key(data):
    composite_key = ''
    for key in data.keys():
        composite_key += str(data[key])
    return composite_key

To run, use the command python duplicate_composite_utility.py <PATH_TO_FILE_WITH_COMPOSITE_KEY>.

Numerous duplicate recruits exist after producing the recruit lists in stage one, so duplicate_utility.py can be run to clean a stage one output file in place: python duplicate_utility.py <PATH_TO_RECRUIT_LIST_FILE>.

Configuring AWS Athena

For this project, Athena is cheaper and simpler to stand up than a dedicated, relational database that would require additional ETL jobs or scripts to migrate from the JSON source files to tables. Athena uses serverless compute to query these raw files directly from S3 with ANSI SQL. After Athena and the Glue Data Catalog have been configured, SQL queries can be run against the datasets in-place. For example, this query computes when commits from the 2020 class were extended offers by the University of Texas at Austin:

select recruit.full_name, timeline.event_type, timeline.event_date, timeline.event_description
from timeline_events timeline
join recruit_list recruit on  recruit."247_id" = timeline."247_id"
where timeline.event_type = 'Offer' and timeline.event_description like '%Texas Longhorns%' and recruit.year = 2020
order by event_date desc

Azure DevOps CI/CD Pipeline for an AWS Lambda Node.js Function

February 8, 2020 · 10 min read

Overview

This project serves as an end-to-end working example for testing, building, linting, and deploying an AWS Lambda Node.js function to multiple environments using AWS CloudFormation, Azure Pipelines, and Azure DevOps. The complete source code is located in this GitHub repository, and the build output is publicly available via Azure DevOps.

Setting Up a Git Repository

Even though I'm using Azure Pipelines for CI/CD instead of Travis CI, you can easily host the code in a Git repository on Azure DevOps or GitHub. Microsoft's GitHub integration is seamless, so there's no reason not to use it should you choose to host your source code there. All features like pull request integration and showing build status alongside each commit on GitHub behave exactly like Travis CI. To enable GitHub integration, simply navigate to the Azure DevOps project settings tab, select 'GitHub connections', then follow the wizard to link the repository of your choice.

Creating an NPM Project for the Lambda Function

A simple npm init command will create the package.json file and populate relevant metadata for the Lambda function. All dependencies and development dependencies are documented there.

Implementing a Sample Lambda Function

In the root of the project, there's a file called index.js with the Lambda function logic. For this example, the handler function simply returns a 200 status code with a serialized JSON body.

JavaScriptindex.js
exports.handler = async (event) => ({
  statusCode: 200,
  body: JSON.stringify('Hello from Lambda!'),
});

Adding Unit Tests and Code Coverage

First, install a few development dependencies using the command npm install --save-dev mocha chai nyc. I've added a unit test in the file test/handler.test.js:

JavaScripttest/handler.test.js
const mocha = require('mocha');
const chai = require('chai');
const index = require('../index');

const { expect } = chai;
const { describe } = mocha;
const { it } = mocha;

describe('Handler', async () => {
  describe('#handler()', async () => {
    it('should return a 200 response with a body greeting the user from Lambda ', async () => {
      const expectedResponse = {
        statusCode: 200,
        body: JSON.stringify('Hello from Lambda!'),
      };
      const actualResponse = await index.handler(null);
      expect(actualResponse).to.deep.equal(expectedResponse);
    });
  });
});

To configure code coverage rules for the CI/CD pipeline, add a .nycrc (Istanbul configuration) file to the root of the project. For this example, I've specified 80% across branches (i.e. if statement paths), lines, functions, and statements. You can also whitelist files to apply code coverage rules with the include attribute.

JSON.nycrc
{
  "branches": 80,
  "lines": 80,
  "functions": 80,
  "statements": 80,
  "check-coverage": true,
  "all": true,
  "include": ["**.js"]
}

With this in place, wire up everything in the package.json with the proper test command:

JSONpackage.json
...
"scripts": {
    "test": "nyc --reporter=text mocha"
},
...

You can verify that everything is configured correctly by running npm test to view unit testing results and code coverage reports.

Configuring Code Linting and Styling

It's important to think of linting and styling as two separate entities. Linting is part of the CI/CD pipeline and serves as static code analysis. This provides feedback on the code that could potentially cause bugs and should cause a failure in the pipeline if issues are found. Styling, on the other hand, is opinionated and provides readability and consistency across the codebase. However, it may not be part of build pipeline itself (i.e. causing the build to fail if a style rule is violated) and should be run locally prior to a commit.

For configuring ESLint, I used @wesbos' configuration as a base using the command npx install-peerdeps --dev eslint-config-wesbos. Detailed instructions can be found in his README. This makes the .eslintrc config in the root quite clean:

JSON.eslintrc
{
  "extends": ["wesbos"]
}

Given that code styling is quite opinionated, I won't inject any biases here. To install Prettier, use the command npm install prettier and add .prettierrc and .prettierignore files to the root.

With this in place, you can add linting and Prettier commands to the package.json:

JSONpackage.json
...
"scripts": {
  "lint": "eslint .",
  "lint:fix": "eslint . --fix",
  "format": "prettier --write \"**/*.{js,jsx,json,md}\""
},
...

Though there is no configuration managed in this repository for code styling, note that you can enable an IDE like Visual Studio Code or JetBrains' WebStorm to apply styling rules upon saving a file.

Enabling Continuous Integration Using Azure Pipelines

Via the Azure DevOps web UI, you can directly commit an initial azure-pipelines.yml file to the root of the repository and configure the trigger (i.e. commits). Once the NPM scripts are properly set up like above, the build stage can be configured to install dependencies, run unit tests, and handle linting in a few lines of code. Note that I've added an archive step because Lambda functions are deployed as ZIP files later in the pipeline.

YAMLazure-pipelines.yml
stages:
  - stage: Build
    jobs:
      - job: BuildLambdaFunction
        pool:
          vmImage: 'ubuntu-latest'
        continueOnError: false
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: '12.x'
            displayName: 'Install Node.js'
          - script: |
              npm install
              npm run lint
              npm test
            displayName: 'NPM install, lint, and test'
          - task: ArchiveFiles@2
            inputs:
              rootFolderOrFile: '$(Build.SourcesDirectory)'
              includeRootFolder: true
              archiveType: 'zip'
              archiveFile: '$(Build.ArtifactStagingDirectory)/LambdaBuild.zip'
              replaceExistingArchive: true
              verbose: true

For now, there is only one stage in the pipeline, but additional stages will be managed in the same YAML file later. The code above spins up a Linux virtual machine, installs Node.js version 12.x, installs the dependencies specified in the package.json file, runs ESLint, and finally runs the unit tests. The logs are made available via Azure DevOps, and the virtual machine is destroyed after the build is complete. If an error occurs at any point (i.e lint issue, failed unit test, etc.), the build does not continue.

Configuring Local Azure Pipeline Builds

As indicated by the nomenclature, Azure Pipelines run in the cloud. It's worth noting that it is possible to host your own build agents if you so choose. Setting it up does take quite a bit of configuration, so for this project, I opted to use the cloud-hosted agent instead. Microsoft has extensive documentation for setting this up, and I've included the Dockerfile in the dockeragent/ directory.

Enabling Infrastructure as Code Using AWS CloudFormation

One of the core goals of this project is to create a complete solution with everything from the source code to the build pipeline and cloud infrastructure managed under source control. CloudFormation is a technology from AWS that allows engineers to specify solution infrastructure as JSON or YAML. For this solution, I specified a Lambda function and an IAM role. Note that the build artifact will be sourced from an additional S3 staging bucket not detailed in the CloudFormation template.

JSONcloudformation-stack.json
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "IAMLambdaRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": ["lambda.amazonaws.com"]
              },
              "Action": ["sts:AssumeRole"]
            }
          ]
        }
      }
    },
    "LambdaFunction": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": "azdo-staging-s3-bucket",
          "S3Key": "build.zip"
        },
        "Handler": "index.handler",
        "Runtime": "nodejs12.x",
        "Role": {
          "Fn::GetAtt": ["IAMLambdaRole", "Arn"]
        }
      },
      "DependsOn": ["IAMLambdaRole"]
    }
  }
}

With this file in hand, creating and/or updating the infrastructure can be done via the command line using the AWS CLI. After generating an access key and secret key, the CLI can be installed and configured with a few commands. Note that I have specified the commands for Ubuntu (apt-get package manager) since that's the virtual machine image that was specified in the Azure Pipelines YAML.

sudo apt-get install awscli
aws configure set aws_access_key_id $(AWS_ACCESS_KEY_ID)
aws configure set aws_secret_access_key $(AWS_SECRET_KEY_ID)
aws configure set aws_default_region $(AWS_DEFAULT_REGION)

These keys should be treated as a username/password combination. Do not expose them in any public source code repositories or build logs. They should always be stored as secure environment variables in the build pipeline. Azure DevOps will always hide secure environment variables even in public project logs.

After the CLI has been configured, the aws cloudformation deploy command will create or update the infrastructure specified in the template. I recommend testing this command locally before including it in the build pipeline.

Enabling Multi-Stage and Multi-Environment Continuous Deployments

With the ability to deploy cloud infrastructure, the build pipeline can now be a full CI/CD one. In the Azure DevOps UI, environments can be created via the project settings. For this project, I created development, test, and production. These will be referenced in the Azure Pipelines YAML script and capture a history of which build deployed which artifact to the corresponding environment.

Another stage can be added to the YAML script that depends on a successful build:

YAMLazure-pipelines.yml
- stage: DevelopmentDeployment
  dependsOn: Build
  jobs:
    - deployment: LambdaDevelopment
      pool:
        vmImage: 'ubuntu-latest'
      environment: 'Development'
      strategy:
        runOnce:
          deploy:
            steps:
              - script: |
                  sudo apt-get install awscli
                  aws configure set aws_access_key_id $(AWS_ACCESS_KEY_ID)
                  aws configure set aws_secret_access_key $(AWS_SECRET_KEY_ID)
                  aws configure set aws_default_region $(AWS_DEFAULT_REGION)
                displayName: 'install and configure AWS CLI'
              - script: |
                  aws s3 cp $(Pipeline.Workspace)/LambdaBuild/s/$(AWS_CLOUDFORMATION_TEMPLATE_FILE_NAME) s3://$(AWS_S3_STAGING_BUCKET_NAME)
                  aws s3 cp $(Pipeline.Workspace)/LambdaBuild/a/LambdaBuild.zip s3://$(AWS_S3_STAGING_BUCKET_NAME)
                displayName: 'upload CloudFormation template and Lambda function ZIP build to staging bucket'
              - script: |
                  aws cloudformation deploy --stack-name $(AWS_STACK_NAME_DEVELOPMENT) --template-file $(Pipeline.Workspace)/LambdaBuild/s/$(AWS_CLOUDFORMATION_TEMPLATE_FILE_NAME) --tags Environment=Development --capabilities CAPABILITY_NAMED_IAM --no-fail-on-empty-changeset
                displayName: 'updating CloudFormation stack'

Note that I have parameterized certain inputs (i.e. $(AWS_ACCESS_KEY_ID)) as build environment variables to be reusable and secure. Again, these are managed via settings in Azure DevOps and not committed to source control.

Because each stage in the Azure Pipeline spins up a separate virtual machine, files such as the build artifact are not immediately accessible between build stages. In the build stage, a task can be added to publish a pipeline artifact (accessible via the path $(Pipeline.Workspace) path) that can be shared between stages.

YAMLazure-pipelines.yml
- task: PublishPipelineArtifact@1
  inputs:
    targetPath: '$(Pipeline.Workspace)'
    artifact: 'LambdaBuild'
    publishLocation: 'pipeline'

Security Checks

Most organizations will require some sort of human approval before migrating to production. This can be configured via Azure DevOps at an environment level. From the web UI, each environment can be configured with separate approvers. For this project, I have configured it so that only production requires approval.

Limiting Production Deployments to the Master Branch Only

As part of a continuous deployment implementation, production migrations should happen every time that the master branch is updated via a pull request. However, all branches should still be privy to the CI/CD benefits. In the Azure Pipelines YAML script, the production stage can be configured to be skipped if the source branch is not master:

YAMLazure-pipelines.yml
- stage: ProductionDeployment
  condition: and(succeeded(), eq(variables['build.sourceBranch'], 'refs/heads/master'))
  dependsOn: TestDeployment

This prevents developers from having to manually reject or skip releases from non-master branches that should never go to production.

Overview​

Nuances and Limitations​

GitHub Repository​

Installing the Tools and Scaffolding the .NET Solution​

Adding Resources​

Deploying and Managing State​

Deciding on Which Technology to Use​

GitHub Repository​

Initializing the Project​

Creating the Lambda Infrastructure and Build​

Integrating API Gateway​

Adding CI/CD Using CDK Pipelines​

Creating Multiple Environments​

Next Steps​

Cloud9 Overview and Use Cases​

Initial Setup​

AWS Resources Created​

CloudFormation Template​

Initialization Script​

User Settings​

Overview​

Getting Started with Former2​

Use Cases for Generating Templates​

Getting Started with CloudFormer (Deprecated)​

Using the CloudFormer Web Server​

Creating a Template for a CloudFront Distribution​

Overview​

Build Status​

Installation and Dependencies​

Usage​

Functionality​

Configuration​

Overview​

Adding a Trigger​

Specifying a Virtual Machine Image​

Installing Swift Programmatically​

Additional Steps​

Background​

Creating the Project​

Creating a Library​

Creating Logic and Unit Tests​

Putting It All Together​

Background​

Benefits and Implications​

Setting Up a Classroom​

Integrating with GitHub​

Creating a Template​

Template Project for Repl.it​

Main File​

EditorConfig​

Repl.it Configuration File​

Fork the Project​

About College Football Recruiting​

Source Code​

Stage One: Fetching Recruit Lists by Year​

Stage Two: Obtaining Ranking History and Recruiting Timeline Events​

Stage Three: Cleanup, Normalization, and Optimization​

Configuring AWS Athena​

Overview​

Setting Up a Git Repository​

Creating an NPM Project for the Lambda Function​

Implementing a Sample Lambda Function​

Adding Unit Tests and Code Coverage​

Configuring Code Linting and Styling​

Enabling Continuous Integration Using Azure Pipelines​

Configuring Local Azure Pipeline Builds​

Enabling Infrastructure as Code Using AWS CloudFormation​

Enabling Multi-Stage and Multi-Environment Continuous Deployments​

A Note on Sharing Files Among Pipeline Stages​

Security Checks​

Limiting Production Deployments to the Master Branch Only​

Overview

Nuances and Limitations

GitHub Repository

Installing the Tools and Scaffolding the .NET Solution

Adding Resources

Deploying and Managing State

Deciding on Which Technology to Use

GitHub Repository

Initializing the Project

Creating the Lambda Infrastructure and Build

Integrating API Gateway

Adding CI/CD Using CDK Pipelines

Creating Multiple Environments

Next Steps

Cloud9 Overview and Use Cases

Initial Setup

AWS Resources Created

CloudFormation Template

Initialization Script

User Settings

Overview

Getting Started with Former2

Use Cases for Generating Templates

Getting Started with CloudFormer (Deprecated)

Using the CloudFormer Web Server

Creating a Template for a CloudFront Distribution

Overview

Build Status

Installation and Dependencies

Usage

Functionality

Configuration

Overview

Adding a Trigger

Specifying a Virtual Machine Image

Installing Swift Programmatically

Additional Steps

Background

Creating the Project

Creating a Library

Creating Logic and Unit Tests

Putting It All Together

Background

Benefits and Implications

Setting Up a Classroom

Integrating with GitHub

Creating a Template

Template Project for Repl.it

Main File

EditorConfig

Repl.it Configuration File

Fork the Project

About College Football Recruiting

Source Code

Stage One: Fetching Recruit Lists by Year

Stage Two: Obtaining Ranking History and Recruiting Timeline Events

Stage Three: Cleanup, Normalization, and Optimization

Configuring AWS Athena

Overview

Setting Up a Git Repository

Creating an NPM Project for the Lambda Function

Implementing a Sample Lambda Function

Adding Unit Tests and Code Coverage

Configuring Code Linting and Styling

Enabling Continuous Integration Using Azure Pipelines

Configuring Local Azure Pipeline Builds

Enabling Infrastructure as Code Using AWS CloudFormation

Enabling Multi-Stage and Multi-Environment Continuous Deployments

A Note on Sharing Files Among Pipeline Stages

Security Checks

Limiting Production Deployments to the Master Branch Only