Kibana dashboard for home network traffic

Monitor home network traffic with OpenWRT and Syslog-ng

(Last Updated On: 04/29/2019)

I wanted to see what happens on my home network. Is there something going on I should be aware of? Is there any device which creates suspicious connections like phoning home? I will use OpenWRT and syslog-ng to get the answers and Elasticsearch to get analytics.Kibana dashboard showing data from OpenWRT traffic syslogs

SOHO routers are usually not really resourceful, neither is mine. Therefore I needed a solution using as little resource as possible but still capable to answers that questions. My solution uses connection tracking data from the main OpenWRT router. Offload the information from OpenWRT to a central syslog server. Enrich it with GeoIP, Reverse DNS and session length metadata by using syslog-ng. Then analyze the logs with Elasticsearch.

The first part of this blog series answers where the packets come and go and some metrics. What are inside the packets is up to another posts.

Logging connection tracking data with OpenWRT and syslog-ng

My original idea was to log the SYN and ACK,FIN packets with Iptables on the FORWARD chain and correlate them. However it did not work as I planned. Although the most important data are included in syslog messages like network source, destination, port numbers. However the logs cannot be easily correlated to each other to get session data because there is no common identifier in iptables logs which would be unique to any given connection. (Stream ID would be but it is encoded in TCP options.) Logging all packets would simply kill the performance so it does not worth it. I needed an alternative solution.

Founding an essay about Flow-based network accounting in Linux did turn the tide. I realized that ‘nf_conntrack’ in Linux kernel (netfilter) actually keeps track of every connections throughout their lifetime (even UDP which is stateless). I only needed a tool to get that data off the OpenWRT router possibly to syslog-ng. The essay mentioned many tools but ulogd looked the most promising.

Ulogd is capable to log connection tracking data to local syslog. The following example shows a NEW and a DESTROY event of a specific connection logged by ulogd.

Mar 13 15:03:57 openwrt ulogd[21765]: [NEW] ORIG: SRC=172.18.0.227 DST=1.2.3.4 PROTO=TCP SPT=57534 DPT=443 PKTS=0 BYTES=0 , REPLY: SRC=1.2.3.4 DST=5.6.7.8 PROTO=TCP SPT=443 DPT=57534 PKTS=0 BYTES=0
Mar 13 15:09:00 openwrt ulogd[21765]: [DESTROY] ORIG: SRC=172.18.0.227 DST=1.2.3.4 PROTO=TCP SPT=57534 DPT=443 PKTS=9 BYTES=3371 , REPLY: SRC=1.2.3.4 DST=5.6.7.8 PROTO=TCP SPT=443 DPT=57534 PKTS=8 BYTES=1301

Note: 1.2.3.4 represents a website, while 5.6.7.8 represents the public IP of my home network.

Configuring ulogd on OpenWRT to send conntrack events to syslog-ng

My OpenWRT systems already send their syslog to a remote central syslog server. OpenWRT uses logread to send syslogs remotely. The remote server runs on syslog-ng. Therefore I only have to configure ulogd to send the connection tracking events to local syslog instead of a file.

Fortunately ulogd can send the events to many destinations. I found a post about logging connection tracking events with ulogd and it helped me to configure the service properly.
On the following link you can find the complete configuration of ulogd I created for OpenWRT 18.06. Nevertheless I describe the details below as well.

  1. First you have to install ulogd and some of its modules. You can do this either in LuCI or in the command line.
    [email protected]:~# opkg update \
    && opkg install ulogd ulogd-mod-nfct ulogd-mod-syslog ulogd-mod-extra
  2. The configuration of ulogd uses an INI style syntax. Two sections will be important for us, [global] and [ct1].
    In [global] section after the list of plugins there will be presets of stacks. Stacks are a list of plugins and they work like commands piped together. There are input plugins, filter plugins and output plugins.
  3. Look for the comment below. We are going to adjust the stack belonging to that comment like this.
    # this is a stack for flow-based logging via LOGEMU
    stack=ct1:NFCT,ip2str1:IP2STR,print1:PRINTFLOW,sys1:SYSLOG
  4. Look for the section called [ct1]. We are adding a new configuration element called hash_enable. Disabling hashes makes ulogd to log both NEW and DESTROY events separately. Otherwise it would only log DESTROY events. Although DESTROY contains everything we need the NEW events as well because of their timestamps, because we will make use of them for building session metadata.
    [ct1]
    hash_enable=0
  5. You can do a manual check of the configuration by starting ulogd manually.
    [email protected]:~# ulogd -v
    Mon Mar 11 15:42:51 2019 <5> ulogd.c:843 building new pluginstance stack: 'ct1:NFCT,ip2str1:IP2STR,print1:PRINTFLOW,sys1:SYSLOG'
    Mon Mar 11 15:42:51 2019 <5> ulogd_inpflow_NFCT.c:1399 NFCT plugin working in event mode
  6. The last step is to enable the service and start it.
    [email protected]:~# /etc/init.d/ulogd enable
    [email protected]:~# /etc/init.d/ulogd start

That is all you have to do to make OpenWRT send its connection tracking events to syslog-ng.

Processing ulogd log messages from OpenWRT with syslog-ng

Note: the complete syslog-ng configuration can be found on GitHub.

My goal is to parse and enrich the log messages with metadata. Syslog-ng provides many parsers to use out of the box. You can see a short overview about the parsers I use in this case.

Metadata we add to logs Syslog-ng parsers providing the metadata
separate upstream and downstream metrics csv-parser()
parsing key-value pairs from all streams kv-parser()
creating session start, session end and length grouping-by()
GeoIP metadata geoip2()
Reverse DNS for clients and servers python()

Some parsers can be chained together in a single parser{} block. They behave like commands piped together in a Linux shell. One parser’s output will feed the input of the next parser. Thus their order is important.

The first block of parsers look like this. Please refer to the ulogd log samples to get an idea why I do these. I also added comments to the config for clarification.

        parser {
            # Split a single log line into two parts around the comma character
            # Name the first part as ORIG and the second as REPLY
            csv-parser(
                columns(ORIG, REPLY)
                delimiters(chars(","))
            );
            # Parse the part of the log message available in macro ${ORIG}
            # the separator is the = character
            # Prefix it with "nf_orig."
            kv-parser(
                prefix("nf_orig.")
                template("${ORIG}")
            );
            # Do the same with the other part of the log message
            kv-parser(
                prefix("nf_reply.")
                template("${REPLY}")
            );
            # Lookup geoip data for the destination (server)
            geoip2(
                "${nf_orig.DST}",
                prefix( "geoip2." )
                database( "/etc/syslog-ng/GeoLite2-City.mmdb" )
            );
        };

Further parsers like grouping-by() and python() will be discussed later.

Correlating log messages from OpenWRT with syslog-ng

The central syslog server receives two type of log messages from each connection. One message from NEW events and another from DESTROY events. These two symbolize the beginning and the end of a session. I will use grouping-by() parser to correlate these messages into one context and get the session length metadata. The admin guide has a flow chart about how messages are added to the context and how it gets terminated. You may want to read that in advance.

The parser uses key() and scope() to build-up a context and identify which messages needs to be added to the context.

For specifying key() requires already parsed data. My setup can be translated to this: “Messages containing the same SRC, DST, SPT and DPT values from the ORIG part of the message belong to the same connection, unless they are not from the same host.

parser p_correlate_session_data {
    grouping-by(
        key("${nf_orig.SRC}/${nf_orig.DST}/${nf_orig.SPT}/${nf_orig.DPT}")
        scope("host")
        # for the sake of completeness I provide "where"
#        where(match("ORIG" value("MESSAGE")))
        having(match("DESTROY" value("MESSAGE")))
        aggregate(
            value("nf.SESSION_START" "${ISODATE}@2")
            value("nf.SESSION_END" "${ISODATE}@1")
            value("nf.SESSION_LENGTH", "$(- ${UNIXTIME}@1 ${UNIXTIME}@2)")
            value("MESSAGE" "Session completed; client='${nf_orig.SRC}'; server='${nf_orig.DST}'; destination_port='${nf_orig.DPT}; protocol='${nf_orig.PROTO}'; session_lenght='${nf.SESSION_LENGTH}'\n")
            inherit-mode("context")
        )
        inject-mode("pass-through")
        # destroy events sometimes arrive later than 2 minutes, even when a client app is already closed (ssh, telnet)
        timeout(600)
    );
};

The context will be closed and evaluated either when a message arrives which matches the filter specified in having() or the timeout() occurs.
Important! The timeout is currently set to 10 minutes. Connections longer than 10 minutes will set to 10 minutes in Elasticsearch.

The evaluation will aggregate the context and creates new name-value pairs specified with values(). For example, it creates a new MESSAGE. This message is logged the same place where the received logs are stored. This is how such a new message looks like.

{
   "DATE" : "Mar 14 10:02:30",
   "PRIORITY" : "notice",
   "HOST" : "openwrt",
   "nf_orig" : {
      "DST" : "1.2.3.4",
      "PROTO" : "TCP",
      "BYTES" : "482",
      "SPT" : "35604",
      "SRC" : "172.18.0.20",
      "DPT" : "80",
      "PKTS" : "6"
   },
   "FACILITY" : "daemon",
   "MESSAGE" : "Session completed; client='172.18.0.20'; server='1.2.3.4'; destination_port='80; protocol='TCP'; session_lenght='134'\n",
   "nf" : {
      "SESSION_END" : "2019-03-14T10:02:30+01:00",
      "SESSION_START" : "2019-03-14T10:00:16+01:00",
      "SESSION_LENGTH" : "134"
   },
   "PID" : "21765",
   "geoip2" : {
      "continent" : {
         "code" : "EU",
         "names" : {
            "en" : "Europe"
         },
         "geoname_id" : "6255148"
      },
      "city" : {
         "names" : {
            "en" : "Budapest"
         },
         "geoname_id" : "3054643"
      },
      "country" : {
         "geoname_id" : "719819",
         "is_in_european_union" : "true",
         "iso_code" : "HU",
         "names" : {
            "en" : "Hungary"
         }
      },
      "location2" : "47.497700,19.079100"
   },
   "PROGRAM" : "ulogd",
   "ISODATE" : "2019-03-14T10:02:30+01:00",
   "nf_reply" : {
      "BYTES" : "599",
      "SPT" : "80",
      "PROTO" : "TCP",
      "DST" : "5.6.7.8",
      "PKTS" : "4",
      "DPT" : "35604",
      "SRC" : "1.2.3.4"
   }
}

Actually I only need the parsed name-value pairs for Elasticsearch. The existence of this message indicates that everything is available. Therefore I will filter on this message later.

Add reverse DNS data to OpenWRT messages with syslog-ng’s Python parser

Although getting reverse DNS can be questionable I still find it useful. You just need to keep in mind, the domain name given back by reverse DNS queries does not necessarily correspond to the domain name of a server you visited. What you usually have is the domain name of an edge router or reverse proxy of CDN networks your OpenWRT connected to.

Unfortunately syslog-ng does not support running name resolution on any macro. However we can write such a parser in Python. By using such a parser I managed to resolve host names for clients and servers and add them to the session metadata. This is how they look like.

   "hostname" : {
      "server" : "blog.hu",
      "client" : "testclient"
   },

There is an existing Python code to do reverse DNS resolution, but it needs some changes to work on any macros. Because of the current license of that post does not explicitly permit changes and redistribution of the content. I only provide a patch you may need to apply on top of that. (I am in discussion with the owner to get proper permissions and update this post accordingly.)

10a11,14
>     def init(self, options):
>         self.ip = options["ip"]
>         self.result_key = options["result"]
>         return True
16c20
<         ipaddr_b = log_message['suricata.dest_ip']
---
>         ipaddr_b = log_message[self.ip]
23c27
<             log_message['parsed.dest.hostname'] = hostname
---
>             log_message[self.result_key] = hostname

Specifying the python parsers should come after the previous parsers. You can do it like this.

parser p_reversedns_server {
    python(
        class("SngResolver")
        options(
            "ip" "nf_orig.DST"
            "result" "hostname.server"
        )
    );
};

My OpenWRT server assigns domain names for clients with fixed IP addresses. Therefore I do reverse lookup on client IP addresses too. If you would like to do the same, then create another parser but with options pointing to nf_orig.SRC and hostname.client. Again for the complete configuration check the GitHub repository.

Sending conntrack sessions from OpenWRT to Elasticsearch

Sending network traffic data of your home network from OpenWRT to syslog-ng is a great thing. But what is more cooler is to send it to Elasticsearch and create visualizations and reports. Creating nice visualizations require proper data type mapping to be set, however my setup lacks Logstash which would do data type mapping for Elasticsearch. Therefore I have to manually set explicit data type mappings in advance.

Warning! This limitation causes a headache when you want to use rotated indexes in Elasticsearch like network-2019.03. You may need to predefine indexes with month suffixes in advance. 

Creating data type mappings for connection session details

Explicit mapping should be created manually by using Elasticsearch’s PUT API. You can do this in Dev Tools → Console. If this is new to you, then you should check my previous step by step guide about it.
Because of the length of the mapping file I provide a downloadable version in the following file on GitHub.

Monitoring my home network with examples

I already created videos about how you can make different type of visualizations in Kibana. You should definitely check the videos if you are stuck. Therefore this time I picked up two recent cases because I wanted to focus on what benefits I could get by having network data in Elastic.

Where and how are my images uploaded by using a web printing service?

I wanted to print dozens of family photos on paper. I decided to use CEWE’s (Rossmann) service to do that. They even have a Qt based software for Linux to place orders.

I wanted to feel secure as I am about to upload private data to someone else’s computer. Where are they uploaded? Are those services use HTTPS for transferring my precious family photos? Let’s find that out.

Note: My actions did take place between 21:00 and 22:30 on March 17th, 2019.

  • In the browser’s address bar I noticed that registration and uploading takes place on “photoprintit” domains via HTTPS. Let’s check the traffic belonging to those domains. Using the following regex query hostname.server:*photoprint* on the dashboard gave me the following results.Kibana Dashboard showing traffic syslog from photoprintit domain via OpenWRT
  • On the top left corner in Coordinate Map visualization we can see that all traffic goes to Germany (I am located in Hungary). As a result I checked the service’s Data Privacy document which also shows that both their web site and the hosting is located in Germany. Kibana Geo Map photoprintit trafficI am not surprised as Rossmann is a German company, however it is quite funny that I had to send my photos to Germany for printing so they will deliver them back to Hungary.
  • Do they use HTTPS for the traffic? Let’s use a Data Table visualization to see that. It seems like that there are some traffic on 80, however most of the traffic goes to 443 (HTTPS).Kibana Data Table photoprintit traffic details

I did not have to do any extra work to get these information because GeoIP, Reverse DNS, traffic size and details are all there. As a result I could do the same exercise with any other web site.

Is there any device on my network phoning home?

Is there any network traffic going to a specified country, for instance is there anything going to China unattended? (I could choose any other country.)

Note: In the following examples I excluded a client host where I made explicit connections to Chinese websites. Data is from 11 day long period.

  • We can filter on a country by using geoip2.country.names.en.keyword.Kibana Dashboard China details
  • According to the results there are two devices which communicate to that country. The device tplink.lan is a WiFi Access Point running on OpenWRT. The other device Galaxy-A5-2016.lan is a smartphone connecting to that access point over WiFi.
    Let’ see the details on a different Data Table visualization again.Kibana Data Table visualization China traffic details

Apparently there are communication going outwards of my home network, however I still do not know what is inside of them. That WiFi Access Point used NAT a couple of days ago until I disabled it, seems like there may be duplicate entries.

A logical next step would be to run tcpdump on OpenWRT parameterized to only capture and save TCP packets going on to those IP addresses. You can further analyze the packet captures  with other tools. But this is a topic of another post I will write later.

Software issues you may need to be aware of

There are several issues I have experienced with syslog-ng in this setup. I have already reported them to syslog-ng upstream, therefore I expect they fix them soon.

Verdict

I hope that with the help of this blog post anyone can monitor their home network with OpenWRT and syslog-ng. However it is not necessary to use Elasticsearch. You can use any other tools you like.

If this post was helpful for you, then please share it. Even more should you have anything to ask then feel free to make any comments below. I will highly appreciate it.

Leave a Reply

Your email address will not be published. Required fields are marked *

22 − 21 =