Frankly Speaking - Analyzing the CircleCI hack
Another story in bad communication and access management
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.
As the New Year starts, I’m hoping to write more newsletters with a better ratio of free to paid ones. However, I would like to especially thank all the paid subscribers that keep me going and make it possible for me to write more often. If you enjoy my content, please consider buying a paid subscription.
I have also removed the paywall from a previous paid post on how security needs more second-order thinking. Check it out!
LET’S BE FRANK
A new year, and already another major hack. Writing about these hacks does not bring me joy, and the whole situation is incredibly stressful for the security teams at both the company and their customers.
This week, I am going to discuss the CircleCI hack using the disclosures on their blog. To start, there’s no easy way to say this, but the initial communications around the security incident by CircleCI were terrible. It does seem like a somewhat severe hack given the actions they have asked customers to take.
I don’t blame the security team, who is probably working around the clock with their 3rd party incident response firm to figure out what’s going on. However, the leaders, especially the security leaders, at CircleCI could and should have done better!
Let’s break down the disclosure starting with the initial one on 1/4/23. First of all, this came out late at night on Wednesday, which is fine, but it didn’t provide any good information and only caused additional stress and confusion. This is a textbook example of communicating too early and without enough detail.
Let’s start with the first paragraph:
We wanted to make you aware that we are currently investigating a security incident, and that our investigation is ongoing. We will provide you updates about this incident, and our response, as they become available. At this point, we are confident that there are no unauthorized actors active in our systems; however, out of an abundance of caution, we want to ensure that all customers take certain preventative measures to protect your data as well.
Saying there are no unauthorized actors currently on the system is useless. First off, just because they currently aren’t active, it doesn’t mean they didn’t make off with credentials that allow them to access the system again in the future. Especially since we are fresh off the LastPass hack, where a threat actor stole information that allowed them to access something again in the future, nothing about this makes it comforting.
All I gather here is that they believe some security incident has happened but don’t want to jump to conclusions. At the same time, they want to mitigate the damage as quickly as possible without actually knowing much. Not a great start here…
Next paragraph on “preventative” actions.
Immediately rotate any and all secrets stored in CircleCI. These may be stored in project environment variables or in contexts.
We also recommend customers review internal logs for their systems for any unauthorized access starting from December 21, 2022 through today, January 4, 2023, or upon completion of your secrets rotation.
Additionally, if your project uses Project API tokens, we have invalidated those and you will need to replace them. You can find more information on how to do that in our documentation here.
Well… ok. As a security professional, I am worried now. I have to rotate all secrets stored in CircleCI, but what are those? Most security teams probably have no idea about the extent of secrets in CircleCI. This probably led to a slew of paging and waking up people late at night as well as customer support requests. I am pretty sure most security teams don’t even know how to rotate these secrets…
Also, they ask customers to review all internal logs for their systems. That’s probably a lot of logs. What am I even looking for? Is it still safe to run CircleCI? Should I shut down all deployments? If I were a security leader at this point, I would believe something bad has happened at CircleCI, and we need to shut everything down that uses it given that credentials could have been leaked. This could potentially bring companies to a standstill given how crucial CircleCI is part of the deployment process. If I were a customer, I would be convening the executive team and determining whether to potentially disrupt the business. If I were a security leader who uses CircleCI, I would be calling an IR firm, e.g. Crowdstrike and Mandiant, for counsel on what to do and to see if any of our critical systems were breached.
It’s clear that the leaders at CircleCI didn’t think about how disruptive such a vague announcement can be to their customers. Now, you have DevOps and security teams scrambling not sure what’s going on. It would have been nice to provide some technical information to at least give some guidance on what might have been affected. Also, it’ll be good to know if we can still use CircleCI.
Moving on to the 1/5 security update:
The number one question we’ve received from customers is, “Can I build?” The answer is yes.
Of course, this is the number one question! Why didn’t they think it would be, especially given how crucial CircleCI is to the deployment process? Who thought it was a good idea not to include it in the first announcement? Were they not sure if it was safe yet? But they said there was no active unauthorized access. I remain confused about this decision and how they didn’t anticipate this question.
CircleCI has now wasted many hours of security and DevOps teams’ time trying to speculate whether they can still run CircleCI.
We also want to provide more details on our recommended actions for all customers.
Please rotate any and all secrets stored in CircleCI. There are multiple ways to do this, and we encourage you and your teams to use your preferred methods. Here is an approach you may follow:
Yes, this would have also been nice since CircleCI is probably used by numerous teams, and it would have made the remediation process easier rather than letting customers figure it out on their own.
In addition to these instructions, today, we created a tool for discovering all your secrets on CircleCI. This should assist you in creating an actionable list of items for rotation.
This would have been nice also given that there are probably a ton of secrets in CircleCI, and no singular person knows all of them. Did they know they were going to create this tool? If so, they should have said it in the first announcement because I’m sure teams would have scrambled to build this themselves. If not, what did they think customers were going to do to figure out “all” their secrets in CircleCI.
While customers are in the process of rotating keys, secrets, and variables, it may be helpful to add additional layers of protection to your CI/CD pipeline configuration.
There’s already enough going on. I don’t think giving additional security measures is going to make customers happy, especially if they are nice to have. Customers are probably barely making it by rotating their secrets. They should focus on telling customers to do what is absolutely necessary.
As a company deeply invested in iteration and improvement, feedback from our customers on incident management is always welcome and appreciated.
It’s clear that many security teams thought their initial disclosure and how they are managing the incident is subpar. I don’t blame them and agree with them. The leadership team violates customer trust by not handling this well and seeming inexperienced.
We understand that many of our North American customers experienced late nights and on-call rotations once our guidance to rotate secrets was released at 6:30 pm PT / 9:30pm EST on Wednesday, January 4. We erred on the side of getting information out as fast as possible to minimize any potential exposure time. We also know that as a global company with customers in almost every country, there is no good time to disclose a security incident except “as fast as possible.”
Is this an apology for a bad initial communication? A late-night communication would be ok if the disclosure was actually informative and had actionable information rather than creating more questions than answers.
Moving onto the next update on 1/6 at 17:52 UTC.
Our team is working to take every action available to assist customers in the mitigation of this incident.
What? Why didn’t they say they were going to do this earlier or in the last update? Security and DevOps teams could have been focused on doing other work if they knew that CircleCI was going to take some action on their behalf. They should have been clear on what actions they plan to take and what actions customers needed to take. This should have been part of earlier communication.
Conclusion
Unfortunately, this will go down as another example of how not to communicate with customers. The initial communication was vague, and it seemed like CircleCI didn’t answer basic questions and/or figure out how disruptive their disclosure could have been. They didn’t answer the basic question: can I still run CircleCI safely?
They provided what actions to take, but they didn’t give customers a good idea of why they should be taking those actions. Maybe, CircleCI doesn’t know what’s going on themselves, but they should have given a sense. It seems like their secret storage was compromised. If that’s the case, they should have said it.
Anyway, despite all these disclosures and actions, I am still left wondering what happened. Will there be further issues? With that said, I know everyone is working hard to mitigate issues here, especially the CircleCI team. My main complaints are meant for the communication and leadership team at CircleCI. Good luck everyone!