As I am working through setting up my OpenWRT router, one of the things I am interested in is better monitoring and visualizations of monitored data from the router itself. This post is about some of the options for doing this:
In general, the problem can be broken down into different pieces:
- Collecting the data
- Storing and indexing the data (to support faster visualizations)
- Visualizing the data
Further, we also have choices on where a particular functionality is done – it can be done in the node that is being monitored (in this case the OpenWRT router) or it can be done on another node in your network. IMO you should generally prefer the latter as it isolates your production service from the non-critical aspects of the monitoring service, reduces software dependencies and attack surface on the production service, allows for better, beefier hardware and flexible environments for downstream processing of the data. With the addition of processing the collected data elsewhere in the network, now the problem can be broken down into:
- Collecting the data
- Shifting the data to collection node
- Storing and indexing the data
- Visualizing the data
Note that of these, only 1 and 2 definitely require something to be present in the node that is being monitored. The remaining can happen distributed in your network and storage can be separated from compute as well.
Doing everything locally
Let’s discuss options where we do everything locally. These are fine if you want basic metrics and graphs.
luci-app-statistics
package. This allows us to collect and visualize data on the router’s web interface. Uses collectd as the backend for collecting the monitoring data.luci-app-nlbwmon
package. Allows monitoring of network usage by clients. Uses nlbw as the backend for collecting the monitoring data.- A bunch of other options for bandwidth and network monitoring listed in the OpenWRT wiki.
Options for collecting data
Before going into options for collecting data, we have to recognize that there is coupling between the four pieces listed above. At the very least, they need to be able to understand, ingest and transform the input coming from the previous step regardless of what that choice was. This in reality means that, the choice for collecting the data adds constraints or complexity to shifting the data to collection node and so on. In RFC parlance, the protocol definition matters once you get into the details and you cannot wish it away as an abstraction. This also pushes us to consider the problem from both ends – where the data originates vs where, who and how the data is consumed.
Finally, we can also recognize that anything is possible – it is after all just code. So even an option that seeming can only collect and store data locally can be made network visible by exposing the written data through our own server process. The only gradient is how much effort something requires to bootstrap, the time you have for it and the cost of the long-term fight against entropy when maintaining the deployed solution.
In order to make good choices here, we need to understand not just the systems for collecting data but also what kinds of data we want to collect and expose which can then help us choose the best systems for that purpose. Multiple options are very likely needed due to the diversity of the data.
Here are some options for collecting data in an OpenWRT system for monitoring, analytics:
- collectd – This is the grand old daddy of monitoring systems in the Unix world. collectd models all collected data as numeric time series. This means that exposing non-numeric data is challenging if not impossible. collectd comes with a plugin system. In OpenWRT there’s also a LUA collectd plugin that allows you to create custom LUA scripts for integration with collectd for producing metrics that are not served through existing plugins. collectd can store data to a local file through RRD or you can use the write_http plugin to expose the data to collectd compatible HTTP endpoint. The write_http plugin will expose data by pushing data during collectd runs.
- prometheus-node-exporter-lua – This exposes data for the OpenWRT node in prometheus compatible manner. It is a service written in LUA with a plugin architecture where you can add new LUA scripts as collectors. Note that this package can be considered a “client” package that satisfies the “Collecting the Data” part of the problem as opposed to Prometheus itself which is more of a combination of “Shifting the data to the collection node” and “Storing and indexing the data”. This service exposes the collected data in a pull fashion when a HTTP client pulls data from the configured listen_interface and listen_port. There is no storage as part of this collection mechanism – users are expected to create their own storage mechanisms by polling the HTTP endpoint periodically and storing the returned data themselves.
- ulogd – Exposes network filtering and monitoring data. This is a mechanism to monitor the traffic going through the router. Supports logging output to different formats including to MySQL, SQLite, PostGres DB.
- Telegraf – Another plugin based collection framework that can expose the collected data to a variety of collection endpoints.
- Syslog-NG – Enhanced system logging daemon for Linux and Unix-like systems that gives the ability for traditional Unix syslog output to be pushed to network, databases etc.
- Many of the tools discussed in the previous section – they can be made to work with collectors outside of their ecosystem.
As you can see, the options are not mutually exclusive and none of them fully address all possible monitoring needs that one might have. Using a combination of them is likely needed to get a comprehensive view of the node along with debug data when deep dives are necessary.
In future posts in this series, I will cover the other parts of this problem and how to integrate different collection mechanisms into a good monitoring solution for an OpenWRT router.