It probably won’t come as a shock to you that as I was writing up my last post on IoT and my new Geiger counter I was mentally reviewing all the things that
scared the crap out of me had me concerned security-wise. I don’t mean the apocalyptic visions of Fallout, but about the fact that I have a device I don’t necessarily trust sitting on my network constantly feeding data to a remote server without much control by me.
I’m predictable like that.
Upon further review I realized I wanted to write up my thoughts on how I would protect against such an unknown, but really… you just need to stick it on an isolated network. You can’t see me, and I can’t see you. I’ll leave it to the networking wonks to decide what isolated really means in this case.
That of course left me with a very short and very boring post.
It then occurred to me that I’ve never really written in detail about different types of threats or risks or how you model them when developing a system. Of course, I don’t want to write in detail about different types of threats — there’s plenty of content out there already about this sort of thing. What I do want to do though is talk about how you model the risks so you can work through all the potential threats that might exist.
In keeping with the theme (because it’s fun and highly improbable) we’re going to build a fictitious radiation monitoring system just like the uRADMonitor, but we’re going to look at how you would evaluate the risks to such a system and how you might mitigate them.
I don’t have any affiliation with the uRADMonitor project aside from the fact that I have a device of theirs in my office on my network. I can’t speak to their motivation for building the system and I can’t say what they’ve done to protect their system. This is purely a thought experiment on how I would build and protect a similar system.
Additionally, this is not necessarily a model where I’ve listed every possible risk or threat and rigorously verified solutions work. In fact, I’ve left out a number of things to consider.
Remember: it’s just for fun.
Projects are created to solve particular problems (also, sky is blue: report). This monitoring network is no different. We’ll review the problem as we see it, and then we’ll create a design to solve the problem. We’ll then review the design and see what kinds of risks we might find and how we can solve for them as well. We’ll iterate on this a few times and hopefully come up with a secure solution.
Not surprisingly I’ll go off on tangents about things I find interesting and gloss over other things. They may not be pertinent to the discussion at all.
Radiation levels affect the health and well being of the population. Sudden significant changes in levels can have dangerous health effects and indicate signs of environmental accidents or geopolitical unrest. An unbiased real time record of radiation levels needs to be created and monitored so appropriate actions can be taken in the event of serious danger.
Build and deploy a distributed network of radiation monitoring agents that persist data to a central store so the data can be reviewed by analysts and automatically notify the appropriate people when significant events occur.
If you’ve ever done product design that cryptic answer might be all you
are going to get need from the product owner. So we need to break that down a bit. We’ve got a whole bunch of agents distributed throughout the world sending data back to a central server. This data should be consumable by analysts (scientists, safety authorities, etc.) either through an API (a program might be analyzing the data), or through a graphical interface (an actual person wanting to review data). These are easy enough to develop — create writable web API’s for the agents, a web portal for the people, and read only web API’s for the programs.
I’d say we glossed over the solution a bit, and we did that purposefully because investing a lot of effort into designing a system without understanding the risks is, well, a waste of time. We just need enough to get things moving. We have a good idea of what we want to build for now, so lets try and understand some of the risks. Our initial design looks something like this:
When we consider the risk of something, we consider the chance that thing might occur. We need to weigh the chance of that thing occurring with the effect it would have on the system. We then compare the cost of the effect with the cost of mitigating the effect and pick whichever one is lower. We generally want the lowest possible level of risk achievable based on the previous formula. In our case it’s actually pretty easy to work out because anything that disrupts the capabilities of the network is considered high risk. Let me see if I can explain why.
The idea of having a distributed network of things connecting back to a central server is not novel. The fairly detailed diagram above could easily be confused with a thousand other diagrams for a thousand other systems doing a thousand different things. There are a lot of commonly considered risks related to this and they can probably be summed up into these properties:
- Server availability – Is the server online and accepting requests whenever an agent wants to check in, or when an analyst wants to review the data?
- Agent availability – Is an agent online, able to measure the things it needs to measure, and have a connection back to the server to check in?
- Data accuracy – Do agents check in data that is measured properly and hasn’t been fiddled with?
This is a pretty broad but reasonable assessment for a number of systems. The risk is that any of these properties answer false because a failure in any one of these properties means the system can’t function as intended. We need to consider what sorts of things can affect these properties.
Here’s the thing: it needs to be available 100% of the time and absolutely accurate. It has the potential to influence the actions of many people regarding things that can kill them. This is one of those things that most security people really only dream about building (we have weird dreams). There can be serious consequences if we fail. Therefore we need to consider even the tiniest chance of something occurring that could affect those three properties and mitigate.
As system designers we need to go nuclear. (yes I know that was a bad pun. I’ll show myself out.)
In our case the system needs to withstand serious catastrophic events simply because it’s intended to monitor serious catastrophic events. If such an event occurs and the system can’t withstand the effects then we’ve failed and the whole point of the project can’t be realized. Conversely, it also needs to withstand all the usual threats that might exist for a run-of-the-mill distributed system.
Normally we would need to do an analysis on the cost of failure so we could get a baseline for how much we should invest, but this is my model and I’m an impossible person. The cost of failure is your life. (muahaha?)