How Shuttlecloud Saves Time and Money by Monitoring with Prometheus

The New Stack · Beginner ·☁️ DevOps & Cloud ·9y ago

Key Takeaways

Shuttlecloud utilizes Prometheus for monitoring and metrics, leveraging its flexibility and labels to save time and money, while also integrating with tools like PagerDuty, Graphana, and Kubernetes.

Full Transcript

[Music] Thank You Cisco for sponsoring our day of podcasting at KU bukan we had some great conversations thanks again to Cisco you can learn more about Cisco and their micro-services platform at mantel dot IO that's ma n TL dot IO hey it's Alex Williams the new stack here at KU Khan on Wednesday afternoon here in Seattle Seattle here in Seattle and I'm here with Ignacio Perez Carter oh yeah that's correct great and you're with shuttle cloud yes and you presented here at KU bukan and we're looking to you know talk to people who are actually using the kubernetes technology and so tell us what she presented and you know more about you know why you took the approach you did sure so I've talked about our experience with Prometheus because we are small start-up small in the number of people that we work so we are seven people in the engineering team and we might be 15 or 16 people the whole company so which company shut up oh right yeah it's a weird shuttle cloud what do we do yeah so we offer to our customers and API to easily migrate or import email and contacts so we have them grow by offering them a service to import their customers so they can import their email yeah and and in create almost it's created a database out of that then or is it more just a service that just stores the information I'll give you an example no we don't sort anything so one example one one of our clients is Google we like Gmail the object okay so if you go if you happen to have a gmail account which almost as I say you can go to accounts and import your emails for from Yahoo or you name it account right we support actually 247 providers right so we offer that service to Google that offer that service to its fantasy so like your the API provide yourself for instance if I'm adding upper third party service naming integrate with Gmail then using your API that's correct okay great so that turns out to be a lot of data that yeah yeah so tell us in what you presented about yeah it tell us these guys start at guess again with Prometheus sure so I sort of start my start my presentation telling a little bit about our history and on the beginning we didn't have that need to have a whole monitoring in place in our company but as you can imagine I've started to grow we needed to have some system so we sort of analyzed all different products that there were on a market we asked different people the devops community in Spain for instance and we decided to give it a try to to Prometheus and we wanted to try for a short period of time and if it didn't work try the next one but it actually went pretty well was it the first one you tried yeah really I loved I told that in presentation that I'd love to tell the story like and then we tried this one Indian and for me this was the best but we were so happy from the very beginning and it was much the implementation was much easier than what we expected so we kept on working with it what were your requirements for for the for this project what were you trying to to learn and achieve yeah so we want it I think I don't think it's a specific case our case but we wanted to have some operation metrics how our sis instances are performing if some service up or down we want also to monitor some business metrics like how many migrations are we processing its their status if you are going right or wrong and we want it also so what some of the things that made us choose Prometheus as the solution is that there is no first that well it's a time series database that you can edit and it's very flexible so it it has labels and you don't have to decide from the very beginning your the metrics you want to have you can add them afterwards or edit them that's one of the good things that we liked about it and other things that you there's no need of any external service like for instance for sensor you need a messaging system like RabbitMQ from it uses independent so you don't need anything else and also it's mating go so it's very easy to install and implement and give it a try because this is why we wanted to do so from all the all the solutions that we want to try the first one because it was the easiest to start with was Prometheus ok and what were you doing before this we sort of have some different systems most of them in-house solutions and we had a lot of manual actions that we had to do to monitor so it was not fully automated so for the operation matrix we had I gave that example I'm not saying that it's correct was obviously it's not but we had some chrome based tasks that do the checks in every single instance and send an email in case something was going wrong you as you can imagine if the instance down there's no one to send the email so there are many things that are wrong with that solution and four they exert the external services like api's or things we are so what I call blackbox monitoring how we are being seen from the outside we trust a Pingdom they also have an ulcer system and they ping all of your services and check if that everything is all right with them what was the drawback of Pingdom the the drawback and I haven't seen no no drawback there the main drawback was for the operations metric and the business metric because yeah an in-house solution with that email sent was not what is not optimal as we started to grow and we start to scale that didn't scale how does Prometheus solve those problems and houses all the problem of scale first of all the as soon as you sell Prometheus you already have the node exporter that offers a ton of metrics that can be useful and that's by default so you don't have to spend any time creating those metrics and you already have them when you have the metrics you can settle our alarm sorry for them and you can have you we have an integration with Kingdom sorry with patron duty and we get directly paged for the thresholds we've set so it's very straightforward so you set thresholds or you know infamy theus and so when those thresholds are met then you get it within a pager duty you know response is generated that's correct that could happen or an email if it's not that important so you can also set the price there is the action you want the alert manager to trigger like if you an example and I'm gonna tell you how we iterate over that and we don't have it like that but on the per beginning for instance for the hard drive usage we had a first if that hard drive is over 80% capacity send us an email if it's 95% capacity page someone but there are a lot of things that are very interesting about Prometheus and there's for one example is the predict linear function and you can create alerts like if if with the current rate this hard drive will be full in two days send me an email so I have two days in advance to plan the actions so there's you don't have that urgency like it's 95 percent and as you can imagine you don't really know it's very difficult to set absolute thresholds mm-hmm pens on the rape some have choice like 500 megabytes and you can happen forever and other ones depends on the rate of how it's getting pulled what are you so you're using Prometheus for this perfect knowing you know for these purposes what are you using it in conjunction with using with kubernetes not yet but we're planning to move to kubernetes okay okay so how are you using it and kind of like in in in corollary with your overall technology stack you know how is it being integrated across your different across your platform yeah that's the we have to so as I said for the operations metrics we have another exporter for the black box monitoring or our API star performing we have there's black box explorer which is a component of Prometheus of course seamlessly but we had to implement our in-house exporter for the business metrics just obviously that's something very custom right so we have we have we actually get all the statuses for all migrations in a non relational database we digest that information to get some stats but apart from that we have an operation exporter that directly connects Prometheus or allows primitives to pull the data from that couchdb so we have direct connection through that exporter from Prometheus to the real results from the migrations this is what the in-house solution we had to implement which was very easy in comparison with other systems to connect those things so our engineering solution and Prometheus so you get all this information in real time view how do you visualize this how do you how are you actually seeing this information we use it's that's also something I mentioned it's really straightforward to put graphing on top of Prometheus okay so it's I mean literally takes lesson it takes minutes to connect them and then the the language used to create the charts in graph Anna it's a prom QL which is precisely the same one used in Prometheus for doing your query and so it's you try the you're querying primitives control-c control-v in graphene and you have to chart so it was very nice very easy so we have long story short we have graph Anna on top of Prometheus and we have the charts on how we performing at that time at any given time right how do you find prom QL I actually don't have any experience with any other monitoring systems so I am NOT the guys this was my first experience with an monitoring solution and I found it really logical like obvious not obvious in the sense that you there's some things and some syntax you have to learn but I think it's quite understandable what are the upstream you know requests that you're making now for you know for for Prometheus are you contributing to the project at all we are not yet because as I as I said we're only seven people yeah in our backlog we have some of the things we want to collaborate with like what like there's one right now there's one couch to be exporter but these one only monitors the metrics about how the database is performing but not its content and we trust a lot on we are on high level availability so we trust a lot on database replications and we have some metrics that we are right now monitoring about kouchi be replicas and this there's a feature in Prometheus I'm sorry it's taking too long for the answer but there's a feature in Prometheus that can parse data from a text file and we are doing this with that solution and it shouldn't be like that and we're planning to collaborate with that Gao Qi be exported to add that functionality so that there are some metrics about that can't of the database so what are some of the other you know aspect of this presentation that you discussed that maybe I didn't touch on that it's that you don't have to have a lot of money or a great budget to to implement Prometheus that's why the talk is called why it's good for your small startup because we've been there I mean when you're a small start-up you're not you don't have the people and you don't have the resources to implement an expensive solution and it's very straightforward very easy to implement but also right now we are currently monitoring 200 instances with it might not sound as that many but for a small team to hundreds already some of them with only one medium instance in GCE and a micro instance for meta monitoring as we're still not in a che but only with two instances and a thirty gigabyte hard drive we can monitor their whole 200 infrastructure and if you do the numbers that that's not a lot of money so in a monthly basis that compared to other solutions that achieve achieve in money in dollars yeah it's affordable yes so you're running on AWS we're currently running on GC we used to run an AWS but we migrated to GC okay so you're running yours on GC hmmm okay okay okay and were there any other aspect of it besides the the affordability of it that you touched on how simple it is to at least with with what we've done with it how easy it is to set up how do you how do you how do you kind of how do you quantify that you know what does it make it so easy is it just is it is it know what is it about that process that makes it easy well I would say that the you are just having the node exporter which comes directly with Prometheus and Prometheus itself you already have a ton of metrics that are very interesting and you only have to set up the other the alert so there's no in that sense you don't have to spend a lot of time developer time getting metrics or installing Prometheus so to begin with I think just to give it a try it's very easy and very affordable and if you are constrained by resources to have something that you can see the results easily and very quickly I think it's he right then I usability speaks to the cost-effectiveness yeah yeah right right why do you guys use GCE the not only because of the world I'd say that in GC we have a lot of traffic to Google services or Google API right so we are sort of a corner case because we use a lot the Google API and the traffic the bandwidth to those Google API is for free so for us it's a cheaper solution because we don't pay the traffic to the Google API that's why GC is very good so we're not mention a few but are there any other major things you'd like to see come out of prometheus that would be helpful for you guys in is you continue to deliver this service that you have that so obviously gonna grow as you know is more company yeah to integrate api's into their own services we are very excited as we are moving to kubernetes we're very exciting to try the monitoring of prometheus kubernetes so we don't really know what what we're gonna see but already from the talks we're very excited to do that and we're really looking forward to and to grow our prometheus and expertise and we could maybe with have in the future but currently with what we have we're not missing any major so you are going to what are your what are your thinking about doing with kubernetes oh I I guess we're not a lot of people might resonate with that but we're the under usage we have with some instances so in order to optimize our resources and we could allocate different parts in the same node so so that's not to have as we have right now one service per one instance per service so right now you have one instance per service that's correct so for the services that are directly migrating information data we don't have a problem because we are just we can do the fine-tuning of how much of the resource of instance we are using but for other resources we are we definitely can so you can run multiple containers on and you know then on a house for example exactly what you're not doing now you're running VMs essentially though not being directly there yeah okay so now you're just starting that process yeah yeah yeah we're still on the verr beginning but excited I'm looking forward to it what are you what are your hopes for what do you we know why why why would you invest your small startup why would you invest your time and energy in this because we believe it's it's a clean solution it's a good idea and we could benefit from it so it's not me why is it what makes it clean what's it what's the benefits yeah so first the economic reason that we're gonna save money having less instances or as many instances in place then because we've already seen some user cases where they already move their infrastructure to Prometheus and it's only when fine and third because we are trying to well we are in a process of having our system more in a I'd say more in a 12 hour factor yeah and that's all factor we I cannot say that we are already there but we are moving to that goal so that's why we believe that it's a it's a something we can benefit economically and also the expertise and the use case we've seen we're gonna go in the same direction well thank you very much he got the appreciate your time to take you know to talk with us about about what you presented on Prometheus here at KU bukan and the best of luck and keep us posted on your on your developments with kubernetes thank you for having me here and for having the interest and listen to our story thank you thank you very much [Music] thanks to Cisco for sponsoring our day of podcasting at KU bukan we had some great conversations thanks again to Cisco you can learn more about Cisco and their micro services platform at mantled IO that's ma n TL dot IO [Music] you [Music]

Original Description

With Prometheus the talk of KubeCon, CloudNativeCon, and last week's GrafanaCon, there can be no denying the fact that monitoring has become a focus for today’s enterprises. For those operating on a smaller scale or at a startup, choosing the right monitoring solution can not only save money, but can result in considerable benefits to overall workflow efficiency, allowing available resources to be better put to use elsewhere. For API migration and integration platform Shuttlecloud, there was never a ‘switch’ to Prometheus. Having a small team roughly 15-17 employees, the small team originally planned on cycling through monitoring solutions until finding one that clicked. Once they got Prometheus up and running, they never got around to trying another monitoring solution. On today's episode of The New Stack Makers, TNS Founder Alex Williams spoke with Shuttlecloud Software Developers Ignacio Pérez Carretero during KubeCon 2016 to hear more about Shuttlecloud’s experience working with Prometheus in production and their decision to migrate their stack over to Kubernetes. Listen on SoundCloud: https://soundcloud.com/thenewstackmakers/how-shuttlecloud-saves-time-and-money
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The New Stack · The New Stack · 11 of 60

1 What's Next for the Cloud Foundry Foundation in 2017 with Executive Director Abby Kearns
What's Next for the Cloud Foundry Foundation in 2017 with Executive Director Abby Kearns
The New Stack
2 How Unikernels Can Better Defend against DDoS Attacks
How Unikernels Can Better Defend against DDoS Attacks
The New Stack
3 Weaveworks is Bringing Horizontal Scaling to Prometheus
Weaveworks is Bringing Horizontal Scaling to Prometheus
The New Stack
4 TNS Analysts Thanksgiving Special: The Evolution of Kubernetes and the Container Ecosystem
TNS Analysts Thanksgiving Special: The Evolution of Kubernetes and the Container Ecosystem
The New Stack
5 How Rancher Labs is Seeing Kubernetes Put to Work in Production
How Rancher Labs is Seeing Kubernetes Put to Work in Production
The New Stack
6 SAP Tests Kubernetes for Cloud-Native Enterprise Software Deployments
SAP Tests Kubernetes for Cloud-Native Enterprise Software Deployments
The New Stack
7 Event Marketing for Today's Developer Evangelists and Community Managers
Event Marketing for Today's Developer Evangelists and Community Managers
The New Stack
8 NodeSource Introduces Certified Modules to Improve Node.js Security
NodeSource Introduces Certified Modules to Improve Node.js Security
The New Stack
9 How Lightstep is Illuminating the Case for Distributed Tracing
How Lightstep is Illuminating the Case for Distributed Tracing
The New Stack
10 How OpenStack Aims to be More Inclusive without being Exclusive
How OpenStack Aims to be More Inclusive without being Exclusive
The New Stack
How Shuttlecloud Saves Time and Money by Monitoring with Prometheus
How Shuttlecloud Saves Time and Money by Monitoring with Prometheus
The New Stack
12 Creating Analytics-Driven Solutions for Operational Visibility
Creating Analytics-Driven Solutions for Operational Visibility
The New Stack
13 Understanding the Application Pattern for Effective Monitoring
Understanding the Application Pattern for Effective Monitoring
The New Stack
14 Building On Docker's Native Monitoring Functionality
Building On Docker's Native Monitoring Functionality
The New Stack
15 The Importance of Having Visibility Into Containers
The Importance of Having Visibility Into Containers
The New Stack
16 How Getting Your Project in the CNCF Just Got Easier
How Getting Your Project in the CNCF Just Got Easier
The New Stack
17 Tectonic Summit Pancake Breakfast: How to Sell Kubernetes to the Hypervisor-Minded
Tectonic Summit Pancake Breakfast: How to Sell Kubernetes to the Hypervisor-Minded
The New Stack
18 The Buzz at Tectonic Summit 2016 in New York City
The Buzz at Tectonic Summit 2016 in New York City
The New Stack
19 Bringing Clarity to the Future of Node.js Modules
Bringing Clarity to the Future of Node.js Modules
The New Stack
20 How FluentD Can Help Monitor Microservice Architectures Through Unified Logging
How FluentD Can Help Monitor Microservice Architectures Through Unified Logging
The New Stack
21 Reshaping Front End Development with Warehouse.ai
Reshaping Front End Development with Warehouse.ai
The New Stack
22 2016 Year End Wrap-Up: Discussing Docker, OpenStack, and Open Source
2016 Year End Wrap-Up: Discussing Docker, OpenStack, and Open Source
The New Stack
23 Here's Why You Should Build a Robot Using Node.JS: Because You Can
Here's Why You Should Build a Robot Using Node.JS: Because You Can
The New Stack
24 How the Node.js Foundation is Utilizing Participatory Governance Models
How the Node.js Foundation is Utilizing Participatory Governance Models
The New Stack
25 Set Up an MongoDB Replica Set in Less Than an Hour Using Bitnami Packages
Set Up an MongoDB Replica Set in Less Than an Hour Using Bitnami Packages
The New Stack
26 Determining Who Bears the Burden of Ensuring NPM Module Security
Determining Who Bears the Burden of Ensuring NPM Module Security
The New Stack
27 How Intel Snap uses Telemetry and Kubernetes to Drive Enterprise Efficiency
How Intel Snap uses Telemetry and Kubernetes to Drive Enterprise Efficiency
The New Stack
28 How the NFL Scored a Touchdown with its Open Source React Framework Wildcat
How the NFL Scored a Touchdown with its Open Source React Framework Wildcat
The New Stack
29 Aporeto CEO Dimitri Stiliadis: When it Comes to Security, Context is King
Aporeto CEO Dimitri Stiliadis: When it Comes to Security, Context is King
The New Stack
30 The Buzz at Node.JS Interactive
The Buzz at Node.JS Interactive
The New Stack
31 Why Going Serverless Doesn't Mean 'No Ops'
Why Going Serverless Doesn't Mean 'No Ops'
The New Stack
32 How Node.js is Transforming Today's Enterprises
How Node.js is Transforming Today's Enterprises
The New Stack
33 JJ Asghar Interview
JJ Asghar Interview
The New Stack
34 How Capital One is Using APIs to Streamline Auto Financing
How Capital One is Using APIs to Streamline Auto Financing
The New Stack
35 SXSW 2017: How Machine Learning Differs From Regular Programming
SXSW 2017: How Machine Learning Differs From Regular Programming
The New Stack
36 SXSW 2017: Data-Driven Applications with Capital One DevExchange's Hydrograph
SXSW 2017: Data-Driven Applications with Capital One DevExchange's Hydrograph
The New Stack
37 SXSW 2017: How Good Engineers Make Bad Business Decisions
SXSW 2017: How Good Engineers Make Bad Business Decisions
The New Stack
38 CloudNativeCon & KubeCon EU Pancake Breakfast 2017: Kubernetes and the Multi-Cloud
CloudNativeCon & KubeCon EU Pancake Breakfast 2017: Kubernetes and the Multi-Cloud
The New Stack
39 CNCF Executive Director Dan Kohn: What's Next for CNCF in 2017
CNCF Executive Director Dan Kohn: What's Next for CNCF in 2017
The New Stack
40 Exploring the Latest Container Runtime Projects in the CNCF
Exploring the Latest Container Runtime Projects in the CNCF
The New Stack
41 Exploring the Future of the Kubernetes Ecosystem
Exploring the Future of the Kubernetes Ecosystem
The New Stack
42 Kubernetes and Continuous Deployment
Kubernetes and Continuous Deployment
The New Stack
43 Kris Nova of Deis at CouldNativecon/Kubecon in Berlin
Kris Nova of Deis at CouldNativecon/Kubecon in Berlin
The New Stack
44 Docker's Quest for Simplicity with the Evolution of Containerd
Docker's Quest for Simplicity with the Evolution of Containerd
The New Stack
45 Developers First: The Cloud Foundry Service Broker API and Kubernetes
Developers First: The Cloud Foundry Service Broker API and Kubernetes
The New Stack
46 Mapping the Future of CoreOS's rkt in the CNCF
Mapping the Future of CoreOS's rkt in the CNCF
The New Stack
47 Red Hat and Dell EMC: Two Perspectives from DockerCon
Red Hat and Dell EMC: Two Perspectives from DockerCon
The New Stack
48 Capital One Opened its APIs to Third-Party Developers — Here’s What They Learned
Capital One Opened its APIs to Third-Party Developers — Here’s What They Learned
The New Stack
49 SUSE Joins the CNCF, Brings Kubernetes to OpenStack Cloud 7
SUSE Joins the CNCF, Brings Kubernetes to OpenStack Cloud 7
The New Stack
50 How Capital One Brings Open Source To The  Banking Industry
How Capital One Brings Open Source To The Banking Industry
The New Stack
51 OSCON Is Coming Back To Portland, A Show Wrapup With Co-Chair Kelsey Hightower
OSCON Is Coming Back To Portland, A Show Wrapup With Co-Chair Kelsey Hightower
The New Stack
52 Dev Or Ops Doesn’t Matter, You Need Observability
Dev Or Ops Doesn’t Matter, You Need Observability
The New Stack
53 Taking The Next Steps In Developing An Open Source Culture
Taking The Next Steps In Developing An Open Source Culture
The New Stack
54 SXSW 2017: How Capital One Became Technology-First With Open Source
SXSW 2017: How Capital One Became Technology-First With Open Source
The New Stack
55 Apcera   Old Apps Spanning New Clouds
Apcera Old Apps Spanning New Clouds
The New Stack
56 Provenance: The Peace of Mind Chef Habitat Seeks to Deliver
Provenance: The Peace of Mind Chef Habitat Seeks to Deliver
The New Stack
57 InSpec: Human Readable, Automated Compliance
InSpec: Human Readable, Automated Compliance
The New Stack
58 The Evolution of SAP HANA Express
The Evolution of SAP HANA Express
The New Stack
59 Women Engineers Who Inspire And Never Give Up
Women Engineers Who Inspire And Never Give Up
The New Stack
60 Three Perspectives on the Evolution of Container Security
Three Perspectives on the Evolution of Container Security
The New Stack

Shuttlecloud's use of Prometheus for monitoring and metrics has saved them time and money, and can serve as a model for other startups and enterprises looking to optimize their operations. By leveraging Prometheus's flexibility and labels, and integrating it with other tools like PagerDuty and Graphana, organizations can build scalable and cost-effective monitoring systems.

Key Takeaways
  1. Set thresholds for alerts
  2. Create alerts for specific metrics
  3. Integrate with PagerDuty for direct paging
  4. Use the linear function for predicting future usage
  5. Implement in-house exporter for business metrics
  6. Connect Prometheus to couchdb
  7. Use Graphana for visualization
  8. Query data with promQL
💡 Prometheus's flexibility and labels make it an ideal choice for monitoring and metrics, allowing organizations to build scalable and cost-effective systems that can be easily integrated with other tools and platforms.

Related AI Lessons

5 Best BrowserStack Alternatives to Optimize Your Testing Infrastructure
Discover the top 5 BrowserStack alternatives to optimize testing infrastructure for better execution speed, pricing, and test management
Medium · DevOps
`wrangler dev --remote` silently writes to your production KV namespace — here's the fix
Learn how to safely use wrangler dev --remote with live KV namespaces without overwriting production data
Dev.to · 강해수
Qwen 3.6 27B Is the Local Dev Sweet Spot — Here's Why
Discover why Qwen 3.6 27B is the ideal choice for local development, and how it can boost your productivity
Dev.to · Carter May
Deploying Spring Petclinic Microservices with Docker Compose: An End-to-End DevOps Deployment Experience
Learn to deploy Spring Petclinic microservices with Docker Compose for a seamless DevOps experience
Dev.to · Nice Nwogu
Up next
Containers on Amazon ECS with Mama J
AWS Developers
Watch →