Disabled anomaly rule still triggers

I disabled the anomaly rule “UBA: User has gone dormant (no activity anomaly rule)” a couple of months ago. I started seing A LOT of events with this name (and log source “Anomaly detection Engine”) a few days ago. I even deleted the rule, and still see those events constantly.
I checked to see if there are other anomlay rules but none of them are enabled.
Any idea why this could happen? I don’t have much experience with this type of rule, maybe I’m missing something.


Virtual Machine Migration

I would like to use the software to monitor an IaaS infrastructure.
I’ve a cluster with Openstack installed and I planned to collect some metrics/kpis at Opestack level and virtual machine level.

My question is this:
After that the training phase is finish and the software is detecting the anomalies, What happen if I migrate some virtual machines from one server to another?
Will I receive a lot of anomalies?
Have I to perform the training phase again from scratch?


I am also interested in this topic. So, if PI re-trains every day automatically, how big is the training set used? Suppose I migrate a VM on may, 20. Then, according to your explanation, PI should perform an accurate anomaly detection not before may, 21. If I can consider the anomaly detection as reliable, this means that the training set should include at least the data coming from may, 20 when the change in my environment was made. Can you confirm this?
It is not totally clear to me how the reliability of the anomaly detection works. Using your heat bath metaphor, I need a stable environment configuration (in your metaphor, thermal equilibrium) in order to consider the anomaly detection as reliable. In an environment with a dynamic configuration, which changes every day by the addition, deletion or migration of resources, how can I expect an unbiased set of anomalies from PI? This is more of a technical question.

Hi Christina,
PI re-trains every day – but we analyze the data every single time we receive new data (every 5 minutes for example). So if you move your VM on 20 May at 14:30 – and there are changes, PI will detect them and analyze them as soon as that data is received (like I said, above – perhaps in 5 minutes). PI will use the mathematical models it last built (which could be 5 minutes before, or 23 hours before). The training set would include a sliding window of data – that can range from 3 days to 28 days depending on the algorithm. After that we have post-processing algorithms that do long term learning so that we can identify regularly repeating monthly events for example.
PI learns the normal behaviour – so if an environment changes every single day then that will be “normal”. In a stable environment, PI will be able to provide a very tight fitting mathematical model to the normal behaviour – and in an environment where everything changes all the time, the mathematical model would reflect that.
The reliability comes as every single mathematical model the machine learning builds is validated to ensure that it neither over-fits or underfits the data. If the mathematical model is not good – the validation stage will kick the model back to the algorithm that built it to re-evaluate it. This can happen many, many times. Eventually the mathematical model is either deployed for evaluation, or abandoned. PI has seven anomaly detection algorithms, and it is possible that one or all seven could build a mathematical model for the data.



How to Interpret Anomaly Description


I have created several anomaly rules based upon saved views and having difficulty understanding the meaning of the anomaly description.

Created the following anomaly rule (specified a single log source to evaluate):
“Anomaly detection of border FW Traffic when time series data is being aggregated by Log Source and when the average value (per interval) of Event Count (Sum) over the last 30 mins is at least 100% different from the average value (per interval) of the same property over the last 1 week”

I thought what this logic would do is evaluate the traffic based on 30 minute intervals and compare it to the same 30 minute interval from the previous week, for example that Monday 1:00-1:30 would be compared to the previous Monday 1:00-1:30 and it would fire only if the value was 100% different (double). I purposefully chose to span 1 week for the aggregated data as I thought this would compare like for like traffic and easily identify anomalies. However this does not seem to be how it works, when the rule actually fired it states:

“Event Count (Sum) (Log Source is %LOG SOURCE NAME%) was aggregated over 30 intervals and the aggregate value was 100% different from the average (per interval) of the same property over the last 1 week at 1:09 PM”

Note it states 30 intervals were assessed, does this mean it evaluated 30 minute intervals x 30 = 900 minutes? The interpretation is ambiguous and the documentation I found seems light. Furthermore the 30 minute intervals appears to be a rolling 30 minutes (i.e. it is not discrete 9:00AM-9:30AM, but rather can be 9:01-9:31, 9:02-9:32, etc.) which makes interpretation even more difficult. We have a number of use cases where I would like to use the Anomaly and Behavioral rules so I would really like to understand them better.

If anyone has suggestions or a better explanation it would be appreciated.


Anomaly Rule Best Practices?


While creating Anomaly rule for the saved search of authentication failures grouped by username I came across some confusions:

1. Although I wish to detect the anomalies in the activity performed at night but still the referenced window to compare the results is for 24 hours. How can I detect anomaly in night activity by comparing it with the usual night activity and not the avg of 24 Hours? Will the time window property test will also applicable to the 24 hours.
2. How can I search the events that got hit by this rule? I could not find anomaly rule while adding ‘Custom Rule’ filter in Log Activity tab
3. Lastly how can I find that what are the near misses. For example I defined a threshold of 40% deviation but in reality there was a malicious activity that deviated just 30% or so and the rule is not fired. So if I somehow can know what is the pattern of traffic then I can fine tune the rule.