1 of 5

How Data are Processed

Overview

There are multiple data sets available from Motus which are each processed differently based on the type of receiver they were collected from as well as the type of transmitter that is being listened for (Lotek or CTT). A diagram of how these data are processed can be seen in the Data Processing Pipeline.

After processing, data are stored in the Motus Database. Public dataset available on the Motus website have broad filters applied to help limit the number of false positives, while unfiltered data can be access through the Motus R Package. Some unfiltered data can also be viewed on the Motus website in the receiver timeline and the deployment timeline.

Lotek detection data

Lotek tags emit an OOK-modulated signal (see how tags work) which is recorded as a collection of time-stamped "pulses" on SensorGnome and CTT SensorStation receivers. A large list of these pulses are then processed by the tag finder algorithm which essentially looks at the timing between individual pulses ("pulse intervals") and matches them to a list of known Lotek tag IDs. Lotek receivers do not record individual pulses, but actually decode the Lotek Tag IDs internally and record the timing and ID of the tag detection. At this time, the Motus is able to decode tag IDs using tag finder on Motus servers due to an existing NDA between Birds Canada and Lotek, but otherwise this codeset is kept confidential. This is why on-board decoding of Lotek tag IDs can only occur on Lotek receivers at this time.

CTT detection data

CTT Tags emit an binary FSK-modulated signal (see how tags work) which is recorded as the tag's ID, consisting of 8 hexidecimal characters (E.g.; "1A2B3C4D"). These data are processed on CTT's server to remove false detections using a reference list of known IDs which includes all tag IDs that have been manufactured. The tag finder algorithm is not required to decode the IDs of CTT tags because the hexidecimal ID can be directly decoded from the binary FSK signal without the need of mapping it to a list of known Tag IDs. Decoding is handled by the 434 MHz radios in the Motus receiver before it's stored on the computer.

Tracks

How public track data is calculated

Tracks are based on the shortest possible paths between detections, and thus are unlikely to represent the true path unless the estimated speed is fairly high. Distance between detections is based on the location of the receivers, which doesn't take into account the detection ranges of the antennas (as much as 20km when conditions are good). Therefore, the estimated distance between detections may be too large by as much as 40km (when two unobstructed antennas are pointed directly at each other), and thus when the receivers are less than 100km apart the estimated minimum speed may be unrealistically high. Speed estimates for receivers more than 100km apart seem to be reasonably accurate. False detections happen sometimes (especially when equipment is faulty, a receiver is near a radio source, or a tag pulse pattern is ambiguous), but rarely last even a second. Very short detections can be filtered out (or not) in the settings.

Data Processing Pipeline

Data are processed differently depending on the type of receiver and tag. Data received from Lotek tags are processed on the Motus server where it matches the ID and burst intervals to known tags within the system. Similarly, CTT tag data are processed on CTT's servers where it can be validated against its own list of tag IDs which have been manufactured. Due to these differences in data processing, data uploaded to Motus - either manual or automated - need to be routed to the correct server based on its contents.

The three diagrams below outline the data processing pipelines for the three types of receivers used within the Motus network.

Public Data Filters

Overview

Some filters are applied to Motus data that can be accessed by the public in order to limit the number of false detections that are presented. These filters are also available to collaborators who download their data from the Motus R Package. This chapter goes over each of these filters and how they are applied.

Motus filter

This filter was the first one to be applied to the public dataset and is the most generalised of the filters. It uses cutoffs for a set of parameters based on an empirical examination of the data. These include:

For 'noisy' sites, minimum of 5 consecutive detections
For 'quiet' sites, minimum of 4 consecutive detections

Noisy sites are categorized as stations with many runs (>= 100 in an hourBin) and a high ratio of runs with lengths of 2 at a given time (>= 85% per hourBin).

Manual filters

There are several edge-cases where the above Motus filter above does not remove false detections which can be problematic to present to the public. Manual filters use tag deployment ID and station ID pairs to remove these bad detections. That is, for every entry in the filter it will remove all detections of the tag deployment from that station.

Station Filters

Certain stations are especially problematic, usually because there are large numbers of tags deployed nearby the station which are all present at once (see Tag Aliasing), but it can also be due to excessively noisy sites producing false detections (see Noisy Stations). They produce false detections regularly enough that we can't keep up with the manual filters.To remedy this, certain stations are flagged as 'problematic' and all detections of non-local tags are removed from public view on the Motus Dashboard only. Non-local tags are considered to be any tag that was deployed further than 10 km from the station.

False Detections

False detections (a.k.a. false positives) are a frequent occurrence in radio tracking. In fact, any tracking technology has erroneous data that must be handled in a unique manner.

How do false detections occur

In Motus, there are multiple ways in which false detections can occur. The three main methods are:

Environmental noise: this can be anthropogenic or natural (from space!).
Bad metadata: researchers haven't entered deployment information for their tags so our system doesn't know they are deployed. Detections of the tags are therefore interpreted as a different (false) tag.
Tag aliasing: a large number of tags (10+) are all deployed at the same location and time and their signals overlap. Their close physical proximity means the signals emitted by the tags appear very similar to the receiver, making it hard to tell them apart. This can result in the mixing of multiple tag signals which may be mis-interpreted as another tag that is not actually present. Read more here.

How are false detections dealt with?

Public data has been processed using broad filters based on theoretical flight speeds, logical geographic/time sequences, and at least 3 consecutive tag bursts at a single station. Most tracks have not been inspected individually for accuracy. However, we do apply manual filters for known cases of false detections.

Motus R Book

Researchers inspect and filter their data based on guidelines provided in the Motus R Book.

Identifying False Detections

There are several methods we use to identify false positives, but for most the first indicator will be intuition. That is, the context of a detection can sometimes provide the most clues, or at least indicate that certain data should be further scrutinized. Typically, certain stations will be particularly susceptible to producing false positives which results in a cluster of detections by different animals at that location.

Examples of false positives in data

Tracks Map

Tracks are most obviously false when there are long distance East-West movements, or during the non-migratory period, North-South movements. Some false detections occur at certain stations multiple times, creating a back-and-forth movement which looks false. In other instances, the animal has merely moved out of range and is likely false – that is, unless the animal was tagged as part of a vagrant study (in which case we don't expect them to be in range!).

Detection timelines

Detection timelines are a very useful tool for determining when stations are functional as well as when they are experiencing a noise event. These show exactly when detections occur as well as the general level noise in the radio environment.

Detection timelines can also provide an indication as to whether a false detection has occurred as a result of environmental noise or whether it was caused by tag aliasing:

Environmental noise will look like a spike in tag detections, where several tags that were never detected before are all detected at the same moment and then are usually never detected again. It is common for a noise event occur when no other tags are being detected, making them stand out.
Tag aliasing will only occur when multiple other real tags are present and being detected. This is because aliasing is a result of a mis-interpreted tag signal. Aliasing usually looks like one to a few tags which are detected at the same time as other tags which are known to be present (i.e., were deployed nearby). Detections of aliased tags always occur less frequently than detections of real tags, but they can still sometimes be detected repeatedly over several days.

Not sure where to find these timelines? See our chapter on detection timelines.

Reporting

We want to know when false detections are found in the public dataset. For data downloaded via the Motus R Package, we only want to know if false data is found where motusFilter == 1. That is, we provide all data (including false positives) in the data downloaded via the Motus R package so that researchers can scrutenize data we've flagged as "false", but that anyone using those data can still use our filters by filtering for data where motusFilter == 1.

For reporting, send us an email with a table (or a list) that includes the following information about each false detection:

Tag deployment ID
Station deployment ID
Justification for removal
Suspected cause of false detection.
The date the false detection was found
Observer name
Any additional comments

Tag Finder

Tag finder is a complex algorithm used to decode IDs of Lotek tags from the raw data collected by SensorGnomes and SensorStations. It was originally developed by John Brustowski and Phil Taylor at Acadia University.

This chapter is intended to provided an overview of how tag finder works and where issues may arise. For a complete, in depth look at the algorithm and its code, see the find_tags github repository.

This chapter describes the algorithm used to decode the IDs of Lotek Tags. For information on how CTT tag data are processed, see How Data are Processed.

Overview

Tag finder takes in a series of time-stamped radio pulses and decodes them by matching them to a list of known Motus Tag IDs. A Lotek tag ID consists of a "burst" of four pulses which are precisely spaced apart such that every tag ID has three unique pulse gaps. Motus extends these IDs by using the interval between bursts as a fourth unique pulse gap. Tag finder is capable of decoding IDs of tags that have overlapping bursts using a technique described below.

How it works

Tag finder uses a computer science concept called Deterministic Finite Automata (DFAs) to associate a series of time-stamped pulses with known Motus Tag IDs. For any given set of pulses, it looks at the first pulse gap and creates a list of 'tag candidates' which the gap could be associated with. It then continues to the next pulse gaps one at a time, narrowing down the list of candidates until just one candidate is left - those pulses are then 'reserved' for that tag and cannot be associated with another tag. During this process, many pulses will be skipped if the gaps don't match any known tag IDs. The first skipped pulse will become the starting point for the next DFA, making it possible to separate out pulses from overlapping bursts.

Tag finder must identify at least two bursts in order to associate pulses with a tag candidate, but often the second burst will be missed due to poor signal. To maximize the possibility of getting a Motus Tag ID, tag finder will continue searching for another burst which matches the candidate tag(s) for up to 20 skipped bursts.

The list of tag candidates is based on all Motus Tag IDs that known to be deployed when the pulses occurred. This means it's not possible for tag finder to identify tags if they aren't registered to Motus with a deployment.

Parameters

There are a series of parameters that tag finder uses to associate a pulse gap with a Motus Tag ID. It uses empirically-based cutoffs for associating pulses with a tag as well as parameters of the tag that were measured when it was first registered to Motus.

For pulses to be grouped together, they must share a similar signal strength, frequency offset, and the pulse gaps must match the expected pulse gaps of the tag within 2 milliseconds. Burst intervals are allowed to vary by up to 4 milliseconds and it will also increase this variation by 1 millisecond for every burst that is skipped.

See our documentation on the parameters for tag finder here.

Reprocessing receiver data

Why reprocess receiver data

Detection data is regularly reprocessed (also referred to as "rerun") following the initial upload and processing. This is primarily required to account for changes in metadata. As described here, proper metadata management is essential for identifying tags, in particular Lotek tags.

Put simply, if a tag does not does have an active deployment at the time the receiver data was originally processed that covers the period of the detection, that tag will not be among the candidate tags that the tagfinder algorithm has at its disposal when attempting to match raw data with known tags. In other words, your tag will not be detected.

Sometimes even despite best efforts, tag metadata is either not present or not correct when detection data is first processed. This makes repeated reprocessing of receiver data a necessity.

There is no fixed schedule as to how often receivers are rerun, but typically it is within the first 1-2 months of first upload, then roughly every 3 months to 6 months after that, decreasing in frequency as more time has elapsed since when it was first recorded and processed

Why detections can sometimes appear or change after reprocessing

Though receiver reprocessing is unavoidable given the constraints of the current technology, it can lead to some surprising results, primarily when detections that were previously seen disappear entirely, or are changed to different tags.

Detections don't appear after first upload but do appear after reprocessing

This is the most common and straightforward case. Usually it is a matter of researchers forgetting to update their tag metadata prior to uploading a station's data. It's also more likely to occur when deploying tags near internet-connected stations, as they often upload data before the updated metadata is present in the system.

This is the most common answer to the question of why no detections are showing up despite having tagged in the near vicinity of an active station.

Detections change to different tags after reprocessing

In this case, rather than there being no candidate tags whatsoever that might match the raw data, as in the previous case, there are tags with active deployments that are potential candidate tags. Due to the allowed tolerance of tag signal properties, sometimes a less suitable candidate tag is selected by tagfinder if the more suitable one is unavailable. Once the proper tag metadata has been updated and the receiver reprocessed, it will resolve to the correct tag.

Detections that did appear then later disappear

This is not directly related to receiver reprocessing, but worth describing here along with the other cases. This is when detections are later identified as false positives, either related to aliasing or a noise event, and are flagged in the database. Flagging them will remove them from all the the detection summaries on Motus.org, but not the complete data downloaded with the R package, where the runs and hits will be assigned the value of motusFilter == 1 . Read more about that here.

How Data are Processed

Overview

Lotek detection data

CTT detection data

Tracks

How public track data is calculated

Data Processing Pipeline

The three diagrams below outline the data processing pipelines for the three types of receivers used within the Motus network.

False Detections

False detections (a.k.a. false positives) are a frequent occurrence in radio tracking. In fact, any tracking technology has erroneous data that must be handled in a unique manner.

How do false detections occur

In Motus, there are multiple ways in which false detections can occur. The three main methods are:

Environmental noise: this can be anthropogenic or natural (from space!).
Bad metadata: researchers haven't entered deployment information for their tags so our system doesn't know they are deployed. Detections of the tags are therefore interpreted as a different (false) tag.
Tag aliasing: a large number of tags (10+) are all deployed at the same location and time and their signals overlap. Their close physical proximity means the signals emitted by the tags appear very similar to the receiver, making it hard to tell them apart. This can result in the mixing of multiple tag signals which may be mis-interpreted as another tag that is not actually present. Read more here.

How are false detections dealt with?

Motus R Book

Researchers inspect and filter their data based on guidelines provided in the Motus R Book.

Identifying False Detections

Examples of false positives in data

Tracks Map

Detection timelines

Detection timelines can also provide an indication as to whether a false detection has occurred as a result of environmental noise or whether it was caused by tag aliasing:

Environmental noise will look like a spike in tag detections, where several tags that were never detected before are all detected at the same moment and then are usually never detected again. It is common for a noise event occur when no other tags are being detected, making them stand out.
Tag aliasing will only occur when multiple other real tags are present and being detected. This is because aliasing is a result of a mis-interpreted tag signal. Aliasing usually looks like one to a few tags which are detected at the same time as other tags which are known to be present (i.e., were deployed nearby). Detections of aliased tags always occur less frequently than detections of real tags, but they can still sometimes be detected repeatedly over several days.

Not sure where to find these timelines? See our chapter on detection timelines.

Reporting

For reporting, send us an email with a table (or a list) that includes the following information about each false detection:

Tag deployment ID
Station deployment ID
Justification for removal
Suspected cause of false detection.
The date the false detection was found
Observer name
Any additional comments