Analysing Unlabeled Data with Randomness and Noise: The Case of Fishery Catch Reports


  • Aida Ashrafi
  • Bjørnar Tessem
  • Katja Enberg



Detecting violations within fishing activity reports is crucial for ensuring the sustainable utilization of fish resources, and employing machine learning methods holds promise for uncovering hidden patterns within this complex dataset. Given that these violations are infrequent occurrences, as fishermen generally adhere to regulations, identifying them becomes akin to an anomaly outlier detection task. Since labeled data distinguishing between normal and anomalous instances is not available for catch reports from Norwegian waters, we have opted for more conventional approaches, such as clustering methods, to identify potential clusters and outliers. Moreover, the catch reports inherently exhibit randomness and noise due to environmental factors and potential errors made by fishermen during report registration which complicates the processes of scaling, clustering, and anomaly detection. Through experimentation with various scaling and clustering techniques, we have observed that many of these methods tend to group the data based on the species caught, exhibiting a high level of agreement in cluster formation, indicating the stability of the clusters. Anomaly detection methods, however, yield varying potential outliers as it is a more challenging task.