Site Reliability Engineer (SRE)
Taiwan
Full Time
Mid Level
【What you’ll be doing】
SRE Engineers are responsible for the overall performance and reliability of SHOPLINE's infrastructure and products. You will be in charge of:
- Automation: SREs are passionate about automation and tooling.
- Improving our user experience and maintaining service stability through SLA/SLO/SLI metrics.
- Incident Response (enhancing the on-call experience, tools, and procedures), including a comprehensive postmortem process and root cause analysis.
- Participating in on-call duties during non-working hours.
- Designing CI/CD processes to deploy SRE tools.
【Who we are looking for】
- A solid understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems.
- Advanced Linux expertise and a willingness to explore Linux internals.
- Proficiency with observability solutions such as Grafana, Prometheus, Opentelemetry, Tempo, Loki, New Relic, and CloudWatch.
- Familiarity with SLI/SLO/SLA concepts to monitor product availability.
- Experience with public cloud services such as AWS, GCP, and Azure.
- A basic understanding of Kubernetes and some hands-on experience.
- Strong scripting skills.
- Development experience and familiarity with programming languages such as Ruby, Node.js, Golang, and Python.
【It'd be plus if you have】
- Experience working in a fast-paced startup environment
- Knowledge of building or maintaining e-commerce platforms
- Basic understanding of Ruby on Rails (RoR) applications
- Basic troubleshooting skills for Kubernetes and networking
SRE Engineers are responsible for the overall performance and reliability of SHOPLINE's infrastructure and products. You will be in charge of:
- Automation: SREs are passionate about automation and tooling.
- Improving our user experience and maintaining service stability through SLA/SLO/SLI metrics.
- Incident Response (enhancing the on-call experience, tools, and procedures), including a comprehensive postmortem process and root cause analysis.
- Participating in on-call duties during non-working hours.
- Designing CI/CD processes to deploy SRE tools.
【Who we are looking for】
- A solid understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems.
- Advanced Linux expertise and a willingness to explore Linux internals.
- Proficiency with observability solutions such as Grafana, Prometheus, Opentelemetry, Tempo, Loki, New Relic, and CloudWatch.
- Familiarity with SLI/SLO/SLA concepts to monitor product availability.
- Experience with public cloud services such as AWS, GCP, and Azure.
- A basic understanding of Kubernetes and some hands-on experience.
- Strong scripting skills.
- Development experience and familiarity with programming languages such as Ruby, Node.js, Golang, and Python.
【It'd be plus if you have】
- Experience working in a fast-paced startup environment
- Knowledge of building or maintaining e-commerce platforms
- Basic understanding of Ruby on Rails (RoR) applications
- Basic troubleshooting skills for Kubernetes and networking
Apply for this position
Required*