TAPIR Core is an ISP (carrier) independent data analysis system which receives aggregated, minimised and de-personified DNS data from TAPIR Edge devices. Core analyse this data and indicates possible anomalies as “observations". Individual ISPs can freely choose how to act upon the observations, if at all.
What is TAPIR Core?
The main purpose of DNS TAPIR is to make DNS data transparent and available to interested parties by addressing the challenge of it being highly privacy sensitive.
To meet the information management requirements needed to handle data from DNS, DNS TAPIR has designed a system with clear boundaries of responsibility for stored data and data flows.
The critical division is between TAPIR Edge, which is the part of the system managed by a DNS resolver operator, and TAPIR Core, which is run by - frequently an independent - Core operator.
- TAPIR Core is an ISP (carrier) independent data analysis system
- Core receives aggregated, minimised and de-personified DNS data from TAPIR Edge devices
- Core analyse this data and based on this, indicates possible anomalies as “observations”
- Individual ISPs can freely choose how to act upon the observations, if at all
- For example, an observation could indicate that the domain “evil.example.com” is both new and ramping up
DNS messages themselves are very sparse in terms of information content but when aggregated across broad groups of clients and DNS providers, indications of criminal activity, misuse of DNS for tracking or information gathering, manipulation of the DNS system itself, and similar activities can be found.
DNS data tends to grow quickly into very large data sets, so the minimisation process that started in Edge also continues in Core. The goal is to retain as much of the information value as possible and continuously evaluate this value in relation to the amount of retained data, and then aggressively cull the data - partly as a further protection of privacy but mainly to avoid getting caught up in collecting data for the sake of collecting.
More information about DNS TAPIR system, TAPIR Edge and Core, can be found here: https://www.dnstapir.se/info_mgmt/tapir_info_mgmt.en
TAPIR Core Input and Output
Output from TAPIR Core Analyse is shared to partners immediately, and some is shared to the public after delay.
Input aggregates and events originate from Edge DNSTAP Minimiser (EDM). Additionally, Core can also use data from publicly available sources such as malware lists to correlate with.
More details of input and output format, delays and examples are found here: https://www.dnstapir.se/info_mgmt/tapir_info_mgmt.en
Data flow diagram: https://github.com/dnstapir/website/blob/main/docs/info_mgmt/tapirdataflow.md
Core Observations
- Baseline normal state of traffic
- Cookies in DNS
- (Root servrers..)
DNS TAPIR system architecture overview
TAPIR Core architecture overview
Node Support Functions
Node Manager
The node manager (Nodeman) is used to enroll new nodes and to renew certificates for existing nodes. It also serves node public keys to signature verifiers, e.g., aggrec and evrec.
Aggregate Receiver
The aggregate receiver (aggrec) receives aggregates submitted by EDM and stores the raw data a S3 compatible object store as well as a metadata in MongoDB.
Received aggregates can be retrieved either directly from the S3 compatible object store, or via the aggregate receiver HTTP API.
Core Support Functions
Event Receiver
The event receiver listens to events published on the message bus, verifies signatures and payload and republishes them on another topic.
Slogger
Slogger (“status logger”) is a receiver service for status update reports from different Edge components. Status update reports are sent as packets on the message bus on a structured set of topics that allow identification of the sending EdgeId and Component (eg. TAPIR-POP (Policy Processor), TAPIR-EDM) from the topic used.
Each Edge Component may define its own set of “Functions” for which status may be reported. A status update report contains a section for each Function that has something to report. For each reporting Function the report contains severity, number consequtive events and possibly a free-form text message.
Slogger stores current and historical reports in a database and provides a management API that allows this data to be queried for by consumers of the information (CLI tools, possible dashboards, etc).
Core Infrastructure Components
MQTT Broker
A generic MQTTv5 message broker. Requires mTLS authentication for all external connections.
TODO: Document ACLs elsewhere.
MongoDB
Core requires a MongoDB compatible database, e.g. MongoDB or Amazon DocumentDB.
Object Store
Core requires an Amazon S3 compatible object store, e.g. Amazon S3, Ceph or MinIO.
Certification Authority
Core requires a Certification Authority (CA), e.g. Step CA, to manage internal certificates and certificate issuers.
Data Analyse Functions
TBD
Cogito ergo sum
Data arriving from Edge
Core Continuous Analysis Engine
The Analysis Engine is preferably run as a common source for analysis in Sweden. With many Edge nodes that send data to one engine, the most valuable results will be achieved. In parallel local engines might be set up.
-
Data processing cluster (e.g., Apache Spark)
-
Analysis function development environment (e.g., Jupyter Notebook)
-
Job queue manager (e.g., NATS)
-
Serverless Scheduler (e.g., Amazon Lambda, Fission or OpenFaaS)