Tagged in: projects, dns, networking

Home is finally equipped with serious networking equipmentTM, but I was missing one core service: DNS.

Gaslighting is a colloquialism, loosely defined as manipulating someone so as to make them question their own reality.

According to Wikipedia

§ everyone gets a lie

OpenBSD and FreeBSD both ship unbound(8) in their base:

Unbound since 1.16 can handle tags and views that will come in handy for serving different client hosts differently (the lying/gaslighting part). The simple idea being that a single unbound daemon should run on my home router and serve all the networks I have at home (grown-ups, kids, guests and IoT). I already had a script that created a file that can be included in unbound.conf(5) so I tried patching it for this new environment and it has been quite an adventure.

After a bit of experimenting, here’s an unbound.conf(5) example that does exactly what I need it to:

server:
  interface: 10.28.56.1
  interface: 10.29.58.1

  # performance, see https://nlnetlabs.nl/documentation/unbound/howto-optimise/
  prefetch: yes
  prefetch-key: yes
  serve-expired: yes
  rrset-cache-size: 100m
  msg-cache-size: 50m

  #crontab(5) contains:
  # ftp -o /var/unbound/db/root.hints https://www.internic.net/domain/named.cache
  root-hints: "/var/unbound/db/root.hints"

  hide-identity: yes
  hide-version: yes

  # Perform DNSSEC validation.
  auto-trust-anchor-file: "/var/unbound/db/root.key"
  val-log-level: 2

  # Synthesize NXDOMAINs from DNSSEC NSEC chains.
  # https://tools.ietf.org/html/rfc8198
  aggressive-nsec: yes

  # define all tags
  define-tag: "bad gambling nsfw home_whitelist iot_blacklist iot_whitelist"
  # sane defaults
  access-control: 0.0.0.0/0 deny
  
  # 10.28.56.0/24 querying "bad" domains get a specific reply
  #  no specifics for nsfw or gambling domains
  #  using different A replies helps identify what went well/wrong
  access-control-tag: 10.28.56.0/24 "bad"
  access-control-tag-data: 10.28.56.0/24 "bad"  "A 127.0.56.1"
  
  # 10.29.58.0/24 querying "bad or nsfw" domains get a specific reply, but we
  # will answer truthfully for domains with the home_whitelist tag
  access-control-tag: 10.29.58.0/24 "bad nsfw home_whitelist"
  access-control-tag-action: 10.29.58.0/24 "home_whitelist" always_transparent
  access-control-tag-data: 10.29.58.0/24 "bad"  "A 127.0.58.1"
  access-control-tag-data: 10.29.58.0/24 "nsfw" "A 127.0.58.2"

  # 10.30.59.0/24 are only allowed a few domains (whitelist), but not tracking
  access-control-tag: 10.30.59.0/24 "bad iot_whitelist iot_blacklist"
  local-zone-tag: . "iot_blacklist"
  local-zone:     . redirect
  access-control-tag-action: 10.30.59.0/24 "iot_whitelist" transparent
  access-control-tag-data:   10.30.59.0/24 "bad"           "A 127.0.59.1"
  access-control-tag-data:   10.30.59.0/24 "iot_blacklist" "A 127.0.59.2"
  
  # break (NXDOMAIN) use-application-dns.net (DoH canary domain)
  local-zone: use-application-dns.net static
  # unbreak laposte.fr/suivi, because they are outsourcing core functionality;
  local-zone-tag: cdn.tagcommander.com "home_whitelist"
  local-zone:     cdn.tagcommander.com redirect
  # NB: tagcommander.com ends up with the "bad" tag, but our setup above
  # overrides that for 10.29.58.0/24
  
  # The generated file is included after the rest
  include: out.lie-to-us
  include: iot_whitelist.conf

remote-control:
  control-enable: yes
  control-interface: /var/run/unbound.sock

lie-to-us produces a file that looks like:

...
local-zone: tgoogle.com redirect
local-zone-tag: tgoogle.com "bad"
local-zone: translategoogle.com redirect
local-zone-tag: translategoogle.com "bad"
local-zone: translatorgoogle.com redirect
local-zone-tag: translatorgoogle.com "bad"
local-zone: tuyulz-blogspot.googlecode.com redirect
local-zone-tag: tuyulz-blogspot.googlecode.com "bad"
local-zone: vaderkalendern.segoogle.com redirect
local-zone-tag: vaderkalendern.segoogle.com "bad"
...
#TV
local-zone-tag: netflix.com "iot_whitelist"
local-zone: netflix.com redirect
local-zone-tag: nflximg.com "iot_whitelist"
local-zone: nflximg.com redirect
...

§ but not too fast

FreeBSD and OpenBSD don’t ship GNU’s bash in their base, and I like to write scripts that “just work”TM. Using only POSIX-ish shell is usually how I achieve this goal but this time it wasn’t possible:

openbsd% time lie-to-us -o out.lie-to-us
lie-to-us -o out.lie-to-us  360.48s user 1458.17s system 87% cpu 34:49.89 total

The output file was around 2 million lines, and OpenBSD’s sh(1) obviously had serious issues looping over that many lines (while IFS= read -r _first _second _rest; do ...; done < input). Linux’ bash didn’t (it completed the run in less than 3 minutes).

§ what do I actually do?

lie-to-us did two things:

  1. fetch and sanitize domain lists for various tags
  2. merge the domain -> tag mapping to the output file format (local-zone-tag:)

How can I speed that up?

§ reducing input

Less data to comb through means it goes fastTM, right? unbound(8) is a recursive resolver, so if it serves a lie for malware.tld, we don’t need to have specific data for foo.malware.tld. The grand plan is as follows:

Which yields:

# sort domains so that subdomains are below their parent domain
rev input | sort | rev > input.sorted

_prev_domain="thisshouldntmatch"
while IFS= read -r _domain; do
  case "$_domain" in
    *."$_prev_domain")
      : ;;
    *)
      printf '%s %s\n' "$_prev_domain" "$_tag" # _tag is set beforehand
      _prev_domain="$_domain" ;;
  esac
done < input.sorted > output

# don't forget the last domain!
printf '%s %s\n' "$_prev_domain $_tag" >> output

Except that it’s very, very slow. On 954k lines input, it took some 16 minutes.

Let’s look elsewhere!

awk — pattern-directed scanning and processing language

Sounds promising, even if the syntax is a bit weird for a newcomer like me. Let’s go.

BEGIN {
  dom = ""; domregex="thisshouldntmatch"
}
$0 !~ domregex {
  if(dom != "") {
    printf("%s %s\n", dom, tag)
  };
  domregex=".*\\."$0; dom=$0
}
END {
  printf("%s %s\n", dom, tag)
}
  1. The BEGIN block sets some variables.
  2. The $0 !~ part checks if the current line does not match the previous domain at all; if it doesn’t match: printf() the domain (and not the “subdomain”), update variables. This code section repeats for all lines of the input.
  3. The END block deals with the end situation. If we redirect the output to our destination and feed awk(1) with a tag and the input, it’s all good!
openbsd% time awk -v tag=tag 'BEGIN { dom = ""; domregex="thisshouldntmatch" }
              $0 !~ domregex { if(dom != "") { printf("%s %s\n", dom, tag) }; domregex=".*\\."$0; dom=$0 }
              END { printf("%s %s\n", dom, tag) }' < input.sorted > output
awk -v tag=tag  < input.sorted > output  17.88s user 1.79s system 100% cpu 19.669 total

A 50× speedup, not too shabby.

§ squashing lists

After our tagged domains are all neatly ordered with their tag alongside them, we need to create a list of domains with all their tags.

porn.tld nsfw
nsfwmalware.tld bad
nsfwmalware.tld nsfw
googleadservices.com bad
local-zone: porn.tld redirect
local-zone-tag: porn.tld "nsfw"
local-zone: nsfwmalware.tld redirect
local-zone-tag: nsfwmalware.tld "bad nsfw"
local-zone: googleadservices.com redirect
local-zone-tag: googleadservices.com "bad"

Again, the shell version of the loop was excruciatingly slow. The awk version is incredibly fast.

$1 == domain { tags = tags " " $2 }
$1 != domain {
  if (domain != "") {
    printf("local-zone: %s redirect\nlocal-zone-tag: %s \"%s\"\n", domain, domain, tags)
  }
  domain = $1; tags = $2
}
END {
  if (domain != "") {
    printf("local-zone: %s redirect\nlocal-zone-tag: %s \"%s\"\n", domain, domain, tags)
  }
}

If our current line is about the same domain as the previous line, append the current tag to tags; else format domain and tags for the output, and update domain and tags to that of the current line. Don’t forget the last line.

§ final code

Heavily inspired by my previous work on lie-to-me, lie-to-us has a very simple interface, but only targets unbound.conf(5) this time. I also dropped lots of difficult things, such as dealing with NXDOMAINs, which can be handwritten by the operator (see example config above, or lie-to-us’ help text).

lie-to-us [-d] [-o out] [-i "domain [domain [...]]"] [tag!URL[!IP][(^|\n)tag!URL[!IP]...]]
lie-to-us -h

lie-to-us is also quite speedy now, completing its run in about 1 minute on my OpenBSD router (34× speedup!) or 47 seconds on my FreeBSD server (the one hosting this blog).

§ lessons learnt