Pipeline Setup :Application Code Push to Production
Confidence at Scale: How ExpenseFlow Moves Code from Push to Production in 18 Minutes
For many engineering teams, the deployment process is a source of high-stakes anxiety. The period between merging a pull request and seeing it live
is often a black box filled with manual checks, fragile scripts, and the silent hope that the new code won't trigger a 2:00 AM incident. In my experience
managing global production environments, deployment anxiety is more than just a feeling—it is a symptom of a weak filter.
The ExpenseFlow platform replaces that anxiety with a high-velocity, 11-stage Jenkins declarative pipeline. Architecturally speaking, we treat every
commit as a liability until proven otherwise. By treating the pipeline as a sophisticated, multi-layered quality filter rather than a simple sequence of
commands, the system moves code from a developer’s machine to production in approximately 18 minutes. This is how we build a culture of
confidence.
1. The "Shift-Left" Economic Miracle
The pipeline begins with Stage 1 (Code Quality), where Linting (ESLint), Type Checking (TypeScript), and Dependency Auditing (npm audit) run
simultaneously. This stage is built on the Shift-Left Principle: catching bugs at the earliest possible moment when they are exponentially cheaper to fix.
Data consistently shows that Stage 1 alone catches approximately 60% of issues before a single test is even executed. By parallelizing these checks,
the total runtime is limited to the slowest individual task, ensuring basic errors are flagged in roughly two minutes. Shifting left isn't just a technical
preference; it is a fiscal necessity to protect the company's bottom line.
💡 The Shift-Left Principle: A lint error caught in 30 seconds costs zero developer minutes to fix. The same error caught in production can cost hours
of work involving incidents, postmortems, and hotfixes. Shifting these checks to the start of the pipeline is what reduces production incidents by
nearly half.
2. Death to the "Latest" Tag
In Stage 3 (Docker Build), the pipeline adheres to the strict standard of Immutable Image Tags. While many teams fall into the trap of using the :latest
tag in production, this creates a catastrophic operational risk. Because :latest is mutable and can be overwritten, it becomes impossible to know
which version of the code is actually running during an emergency.
Instead, every image is tagged with its unique Git commit SHA (e.g., sha-a3f8c21). This ensures a perfect audit trail and guarantees that "what we test
is exactly what we deploy."
The Golden Rule of Immutable Image Tags: Never use "latest" in production. Git SHA tags are permanent. When an incident happens at 2:00 AM, you
can use the SHA to identify the exact code live in the cluster and resolve the issue in minutes rather than hours.
3. Security as a "Hard Block," Not an Afterthought
Stage 4 (Security Scan) employs a defense-in-depth strategy, utilizing Trivy CVE scans alongside the OWASP Dependency Check. Unlike systems that
merely provide reports for later review, this pipeline implements a hard block policy. If a CRITICAL or HIGH severity vulnerability is detected, the
pipeline returns an "exit-code 1" and halts immediately.
In regulated industries, unpatched container vulnerabilities are a primary vector for breaches. To mitigate this risk, the pipeline follows a strict protocol
when a threat is found:
* Abort: The process stops immediately, preventing the vulnerable image from being pushed to the registry.
* Archive: Scan results are stored as JSON artifacts for the security team to review.
* Alert: Automated notifications are sent to the security-alerts channel, often triggering high-priority tickets.
4. Why Mocks Aren't Enough (The Integration Stack)
While Stage 2 (Unit Tests) provides fast feedback, it relies on mocks. In our architecture, Stage 2 also acts as a cultural "contract" with developers by
enforcing a hard 80% line coverage gate. If the test suite degrades below this line, the code never reaches the integration phase.
However, mocks cannot catch orchestration failures or database constraint violations. This is where Stage 5 (Integration Tests) serves as the "last line
of defense." Inside the Jenkins pod, the pipeline spins up a real environment using docker-compose.ci.yml, including actual Node.js services,
PostgreSQL, and Redis.
To maintain our 18-minute velocity, we utilize a "tmpfs" strategy, running the databases entirely in memory. Architecturally, we trade persistence
(which is unnecessary in a CI environment) for IOPS performance. This removes the disk I/O bottleneck that usually kills test velocity, making integration tests run 2x faster without sacrificing realism.
5. Stage 6: The Point of No Return
Once the automated testing and security gates are cleared, the pipeline moves to Stage 6: Push to ECR. This is the most significant transition in the
journey. Before this point, the code was just a candidate; now, it receives the "Stamp of Approval."
By pushing the image to the AWS Elastic Container Registry, we create a permanent, immutable artifact. This specific SHA-tagged image becomes
our "Golden Image." It is this identical artifact that will be deployed to Dev, Staging, and eventually Production, ensuring absolute environmental parity.
6. The Power of the "Four-Eyes" Gate
Despite the high level of automation, Stage 10 (Manual Approval) is the vital human link. This is the only manual step in the 18-minute journey. To
prevent "Pipeline Rot"—where code sits in a pending state until the environment drifts too far from what was tested—the gate has a strict 24-hour
timeout.
This stage captures the APPROVED_BY identity to maintain a tamper-proof audit trail, which is vital for compliance. A senior engineer must verify the
evidence before the final push to production using a comprehensive checklist.
The Approver’s Checklist:
* Verify successful staging deployment and smoke tests.
* K6 Load Tests: Confirm p95 latency is under 500ms and error rate is below 0.1%.
* Kibana Logs: Search for new error patterns or regressions in the staging environment.
* Grafana Dashboards: Ensure system health metrics are within baseline parameters.
* Confirm the rollback plan is ready and known.
7. The 30-Second Safety Net
The final stage (Stage 11) executes a Rolling Update in the production cluster. We use a specific configuration of maxUnavailable=0 and maxSurge=2.
This ensures zero downtime because Kubernetes adds new capacity before it ever removes old pods; traffic flows continuously as pods are replaced
one-by-one.
The true strength of this stage is the Auto-Rollback mechanism. If the new version fails a readiness probe or crashes, Jenkins detects the failure and
triggers a restoration of the previous version in under two minutes.
🚨 PRODUCTION DEPLOY FAILED: When the pipeline detects a failure, it initiates an automatic rollback. A PagerDuty alert is triggered, and the team is
notified via Slack: "Automatic rollback initiated. Previous version restored." This allows engineers to investigate console logs without the pressure of an
ongoing outage.
Conclusion: The Pipeline is the Product
The philosophy of the ExpenseFlow architecture is simple: the pipeline is the product. Every stage exists to catch a specific class of problem, from a
simple syntax error to a critical security vulnerability or a database bottleneck.
By the time a developer sees the "Live in Production" Slack notification, the code has been through eleven distinct layers of validation. It leads to one
final question: Do you trust your automation enough to step away from the keyboard after a push?
With this model, the goal is always clear: git push → 18 minutes → production. Reliability isn't found in manual oversight; it is built into the filter.
Comments
Post a Comment