The increase in network-connected devices the past years has been something of a dual-edged sword. While on one hand it’s really nice to have an easy and straight-forward method to have devices talk with each other, this also comes with a whole host of complications, mostly related to reliability and security.
With WiFi, integrating new devices into the network is much trickier than with Ethernet or CAN, and security (e.g. WPA and TLS) isn’t optional any more, because physical access to the network fabric can no longer be restricted. Add to this reliability issues due to interference from nearby competing WiFi networks and other sources of electromagnetic noise, and things get fairly complicated already before considering which top-layer communication protocol one should use.
In this article we’ll be looking at implementing such a network-based system, securing a WiFi network with TLS, and the use of MQTT in combination with a proxy. I’ll illustrate this using experiences and lessons learned while working on this Building Management and Control (BMaC) project that I covered in a previous article.
Getting MQTT into your system
Message Queuing Telemetry Transport (MQTT) is a small, binary protocol that was developed by Andy Stanford-Clark of IBM and Arlen Nipper of Cirrus Link in 1999. Its version 3.1 was submitted in 2013 IBM to OASIS for standardization. Another version of MQTT is MQTT-SN, which is designed for lower-bandwidth, non-TCP networks, such as Zigbee, UDP and Bluetooth.
Due to its compact size and simple, client-server architecture, it is highly suitable for connecting larger and smaller networks of sensors, especially in high-latency, low bandwidth situations. It uses a subscribe-publish model, where clients can subscribe to topics on which others can publish messages. These messages can be persistent, have guaranteed delivery and (as of version 5) can automatically expire if they cannot be delivered.
The obvious advantage of MQTT is that it supports everything from always-online, high-bandwidth clients, to low-powered, remote sensor nodes which just wake up every week, dial into a satellite link and send some sensor readings while updating their calibration settings from data which they receive at the same time from some other client.
The use of MQTT (with the Mosquitto MQTT broker) in the BMaC project was initially more of a coincidence, with us using it mostly because an MQTT client was already integrated into the framework we were using on the microcontrollers. None of us had really thought about the advantages or disadvantages of MQTT over any alternatives. Now, years later, it’s easy to see why MQTT was the right choice. While running it on an internal, TCP-based network, we got the guaranteed delivery aspect of TCP along with its built-in checksum verification, with the MQTT protocol itself putting no constraints on the payload it can carry, whether it be text-based or binary.
Real competition for MQTT does not really exist. AMQP is also fairly popular, but it targets desktop and server systems in an enterprise setting, and doesn’t really scale down to RAM-constrained 8-bit microcontrollers. Further, AMQP also defines an encoding scheme for the payload, whereas MQTT leaves one free to use whichever encoding or serialization scheme one wishes to use.
With BMaC we could thus develop our own payload format that would be sent to and from the ESP8266-based nodes. This resulted in a compact, binary format using just a few bytes at most that sufficed to configure nodes over MQTT as well as adjust the fan and relay settings.
Securing the System
The best way to secure a system is through the practice of security in depth. That means that every part that could be exploited should be secured in some fashion. Assuming a system like that of BMaC, this means that the physical hardware is all inside an office building, which has its own security system installed.
This security system can be simple mechanical locks, or some NFC tag-based system. Sensitive areas like server rooms require their own access keys or permissions associated with the NFC tags. This practically eliminates any risk of unauthorized individuals gaining access to the hardware, let alone perform any nefarious actions.
Wireless networks for a system like this are of course secured by WPA2 or similar, meaning that without the right password or certificate, one cannot connect to the wireless network. Any traffic on the network will consequently be encrypted. This shifts the most likely threat to those who somehow have gained access to the network, whether through legal means, or because the WiFi SSID and password were in a photograph that got published on the public company blog (true story).
At this point we have an okay level of security, but the missing ingredient is to secure the traffic between the nodes and the backend servers, meaning either TLS encryption (very common), or Elliptic Curve Cryptography (ECC), which would be the superior choice because it’s faster, requires significantly less RAM, and has much smaller certificates. Unfortunately ECC has taken the backseat to TLS, mostly on account of it being patent-encumbered for much longer.
This made TLS the easier type to integrate into the BMaC project, as adding ECC would have meant ditching the axTLS library in the framework which we were using for the ESP8266 nodes and integrating an alternate library that supports ECC and also fits in the limited RAM provided by this microcontroller.
The Part Where Things go Boom
We quickly found out that the default handshake setting in TLS encryption for TCP connections causes massive problems for an ESP8266 and similar MCUs which tend to have less than 30 kB of SRAM available when this handshake event occurs.
The default TLS configuration dictates namely that the maximum TX/RX buffer sizes are allocated when a secure connection is attempted, being 16 kB each, or 32 kB in total. With non-trivial firmware this results in the MCU running out of memory and the MCU resetting. Fortunately this setting can be changed on the side of the server, as noted in this article on TLS. This would allow the server to set the TLS buffer size to something that would fit in the MCU’s SRAM.
Sadly for BMaC, the server on the Mosquitto MQTT broker didn’t have this as a configuration setting, requiring us to change it in the source code and recompile the server. That seemed a bit of overkill.
Instead we opted to add a different TLS endpoint to the system, using HAProxy as an intermediate. We configured an interface with TLS-only access that simply routes any decrypted data to Mosquitto via the localhost loopback interface, and set the tune.ssl.maxrecord property to 2 kB, for 4 kB of buffer space on the ESP8266. After enabling both server and client certificates on the HAProxy and BMaC node firmware respectively, we had a TLS-encrypted connection up and running, ensuring that not even our colleagues could sniff on what we were doing.
Putting it Together
By the time we had finished wiring up the first controller for the air conditioning system at the office, the BMaC project consisted out of a wireless network of motion, temperature, CO2, air pressure and coffee usage sensors, along with a bunch of relays and fan controllers, all tied together using a central backend server and secure MQTT connections..
After getting the network set up, with MQTT secured using client-side certificates to make sure that only genuine BMaC MQTT clients could connect, it was very nice to be able to focus on getting the commands and data transferred between the nodes and the backend. The only issue that really annoyed me there was the lack of an MQTT desktop client that would allow me to do MQTT monitoring, active topic discovery and be directly compatible with binary payloads instead of assuming that one would only ever use MQTT for text-based payloads.
This led to me developing a C++/Qt-based MQTT desktop client called MQTTCute. It’s the client I wish I would have had right from the beginning as I was setting up the whole system, trying to get an idea of what was being sent around on the MQTT topics. Since we ended up using a binary protocol for BMaC, having a built-in hex view function in the desktop client would have been invaluable.
Regardless, if we had to do it all again, with the knowledge we gained, we would pretty much still have picked the same route. Likely we would try to use ECC instead of TLS, however, just to save ourselves the overhead of using an additional TLS endpoint and proxy server.
We also found that a number of MQTT libraries assumed text-based payloads, and would use C functions like strlen() and kin. Many of them have since received pull requests from yours truly so that those libraries now happily accept any kind of binary data one wishes to send via MQTT, including images.
The Elephant in the Room
When it comes to MQTT and similar client-broker systems, there’s always the argument that they cannot be reliable because they have a single point of failure in the form of the MQTT broker. This is definitely a valid point, but also not nearly as valid as one might assume.
MQTT brokers tend to run on reliable server hardware, in the case of BMaC as a Linux virtual machine instance on a storage cluster. For the broker to suddenly vanish off the network would require the kind of catastrophic failure that’d cripple the company’s network along with it.
One could conceivably set up a second, fall-over MQTT broker on a secondary address, but that would be a lot of work without good cause. In our own year-long BMaC development process, we had zero failures of the Mosquitto broker and more issues with glitches in the (old) WiFi access points.