Things you should definitely do in a greenfield application
A lot of these are Rails specific, but can be adapted to other stacks.
- Steal some things from https://github.com/discourse/discourse
- QueryTags in query logs, including the code namespace and function (Job, Controller+Action, etc) as well as the trace id
- Tests
- Use minitest
- Use fixtures, not factories
- Postgres
- Use tablespaces and put tables on tmpfs (RAM)
- Use unlogged tables
fsync=off
full_page_writes=off
synchronous_commit=off
vacuum=off
checkpoint_timeout=60m
wal_level=minimal
max_wal_senders=0
- Enforce a maximum test runtime
- Fix or delete flaky tests
- https://github.com/basecamp/gh-signoff and don’t use a CI
- Preconnect cross-origin domains
- Compress assets + use a CDN
- Compress response payloads (HTML/JSON)
- Sidekiq jobs
- Leverage https://github.com/sidekiq/sidekiq/wiki/Iteration
- SLO based queue naming (within_6_hours, within_0_seconds, etc.)
-
Idea: expand
sidekiq_options
to acceptslo
andweight
to pick/generate a queue name which is automatically usedslo: 5.minutes, weight: 0.5
-
Idea: expand
- Manual load shedding
- Short-circuit all jobs of class X with argument[0] = Y
- Implement Sidekiq batch invalidation by default for every job
- Provide fairness between tenants
- TODO: How? Sidekiq Limiter?
- Avoid using UUIDs at all
- Use bigint primary keys, for public identifiers use sqids with a secret
- RMP in all environments
- RUM in browser
- Autoscale web and worker
- Instrument request queue time
- Automated alerts/SLOs in Terraform
- Prosopite/strictloading in tests
- jemalloc
- Turbo/DataStar/hx-boost
- Mise Tasks
- Code formatting
- Standardrb
- Enforce with git hooks https://github.com/sds/overcommit
- https://evilmartians.com/chronicles/gemfile-of-dreams-libraries-we-use-to-build-rails-apps
- Enforce a zero-bug policy
- All “Repository” operations must be batched
- Setup a CSP policy from day one
- Set a low statement_timeout in Postgres
- Track performance regressions
- Metric A: What I want to monitor, e.g., # of requests that took longer than 5 seconds, grouped by controller action
- Metric B: Same, but timeshifted week ago
- Alert: B/A > 2
- Feature flags
- Support a tree-like structure, since some feature flags are nested by nature. Enabling an child node should enable all parent nodes
- This can be done simply with a parent_feature_ids array column
- Notify when feature flags are fully rolled out or fully disabled, every week
- As a reminder to delete dead code branches
- Support a tree-like structure, since some feature flags are nested by nature. Enabling an child node should enable all parent nodes
- Setup span/slog fields for metrics/signals/monitoring, per request/job:
- Tenant ID
- User ID
- Feature Flags
- Error code
- Number of DB queries
- Number of DB tables queries
- Number of DB services hit
- Number of Elastic queries
- Number of Elastic services hit
- Number of cache reads
- Number of cache writes
- Number of cache services hit
- Number of object allocations
- Number of HTTP requests
- Number of HTTP request retries
- Number of HTTP requests uniqued by domain
- Time spent in DB
- Time spent reading from DB
- Time spent writing to DB
- Time spent in cache
- Time spent reading from cache
- Time spent writing to cache
- Time spent in view layer
- Time spent in HTTP requests
- Time spent
- CPU wall time spent