Experience

Brandon Kauffman

Linux Administration

RHCSA and Linux+ certified. I have managed 300+ RHEL servers implementing Satellite, IAM, and ansiblizing deployments. I led regular systems performance monitoring optimizing performance such as increasing IO throughput by 20%.

Kubernetes

CKA and CCA certified. I have built and managed on-prem and EKS cloud Kubernetes clusters. I utilized advanced networking features to integrate with existing infrastructure using.

I managed a 20 node physical cluster which monitored our on-prem environment of 800 VMs and 7,000 IoT devices at Liberty University. This cluster also worked as the monitoring system for our larger OpenShift cluster.

In EKS, I led a migration to utilize spot instances where possible in an autoscaling group for EKS to reduce costs by 70%. at Pulsar Informatics Inc. I automated deployments utilizing Terraform, and Flux across accounts.

Rust

I have built webservers, front ends, BPF programs, and other tools using Rust. It is my primary programming language for leet code and new projects. I often use Tauri to create desktop applications that speed up my workflow. I've contributed to several open source Rust projects.

I have led developmet on multiple projects such as an internal alert management. Over the course of a year, the project was able to reduce MTTA by 20% and reduce MTTR by ~80%. I have also used Go to create many internal Prometheus exporters. I've contributed to several open source Go projects.

Python

I have built and maintained a variety of internal python packages for a team of 12 others. I've done data analysis on Oracle and Microsoft SQL Server with python. I have contributed and built Django applications with GraphQL and REST. I've implemented OTEL to an OSS projects and modified open source libraries

Site Reliability Engineering

I have SRE experience with a variety of monitoring products. I have maintained Elasticsearch, Grafana, Prometheus, OTEL with Clickhouse, and Thanos. I have used those products, Datadog, and Cloudwatch to implement SLOs. Using these tools to navigate deployment strategies uptime was increased from ~99.997% to ~99.99999% for crucial services. I have worked with Projects in Golang, Python, Java, and PHP to implement RUM, APM, Profiling, and Distributed Tracing. I've assisted developers in debugging applications and reduced avg response time by ~20%.

I've used SNMP to monitor network and physical appliances that support infrastructure and created dashboards and reports to improve capacity planning.

Kafka/Red Panda

I maintained a production kafka cluster with over 250 topics on-prem. The largest topic ingested 100,000 events per second. After switching to Red Panda, this topic was able to maintain 10,000,000 events per second after a consumer failure had been restored. I was also able to reduce servers by 40% by implementing better tunables.

AWS

I am a certified AWS Solutions Architect Professional. By utilizing best practices I have managed to reduce cloud spend by ~15%. across various services. I have implemented proper security practices to harden our organization and improved account structure for better management and scalability.

Devops

I have managded CI/CD pipelines with a variety of tools. These pipelines included workloads for mobile workloads, Terraform, Ansible, Flux, and more. I was able to reduce CI/CD runtime by 60% by using best practices and developed a self hosted auto-scaling cluster to process the pipelines. I automated alert resolutions for various systems by creating a service to process alerts. I managed a variety of terraform modules to deploy our services with IaC.
I was the SME a variety of databases such as Postgres, MongoDB, and . I staged regular backups, replication, and managed major version upgrades. In postgres I was able to tune parameters that optimized the data for dev, staging, and prod workloads and reduced query times.