Home > Sketch-based Change Detection

Sketch-based Change Detection


Sketch-based Change Detection 

Balachander Krishnamurthy (AT&T) 
Subhabrata Sen (AT&T)  
Yin Zhang (AT&T) 
Yan Chen (UCB/AT&T) 
 
ACM Internet Measurement Conference 2003


2  

Network Anomaly Detection 

  • Network anomalies are common
    • Flash crowds, failures, DoS, worms, ��
  • Want to catch them quickly and accurately
  • Two basic approaches
    • Signature-based: looking for known patterns
      • E.g. backscatter [Moore et al.] uses address uniformity
      • Easy to evade (e.g., mutating worms)
    • Statistics-based: looking for abnormal behavior
      • E.g., heavy hitters, big changes
      • Prior knowledge not required
 

This talk


3  

Change Detection 

  • Lots of prior work
    • Simple smoothing & forecasting
      • Exponentially weighted moving average (EWMA)
    • Box-Jenkins (ARIMA) modeling
      • Tsay, Chen/Liu (in statistics and economics)
    • Wavelet-based approach
      • Barford et al. [IMW01, IMW02]

4  

The Challenge 

  • Potentially tens of millions of time series !
    • Need to work at very low aggregation level (e.g., IP level)
      • Changes may be buried inside aggregated traffic
    • The Moore��s Law on traffic growth �� 
  • Per-flow analysis is too slow / expensive
    • Want to work in near real time
  • Existing approaches not directly applicable
    • Estan & Varghese focus on heavy-hitters
 

Need scalable change detection


5  

Sketch-based Change Detection 

  • Input stream:  (key, update)
 

Sketch 
module 

Forecast 
module(s) 

Change  
detection 
module 

(k,u) �� 

Sketches 

Error 
Sketch
 

Alarms 

  • Report flows with large forecast errors
 
  • Summarize input stream using sketches
 
  • Build forecast models on top of sketches

6  

Outline 

  • Sketch-based change detection
    • Sketch module
    • Forecast module
    • Change detection module
  • Evaluation
  • Conclusion & future work

7  

Sketch 

  • Probabilistic summary of data streams
    • Originated in STOC 1996 [AMS96]
    • Widely used in database research to handle massive data streams
 

With probabilistic guarantees (better for larger values) 

Compact 

Sketch 

100% 

Per-key state 

Hash table 

Accuracy 

Space


8  

K-ary Sketch 

  • Array of hash tables:  Tj[K]    (j = 1, ��, H)
    • Similar to count sketch, counting bloom filter, multi-stage filter, ��
 

1 

j 

H 

0 

1 

K-1 

�� 

�� 

�� 

hj(k) 

hH(k) 

h1(k) 

  • Update  (k, u):   Tj [ hj(k)] += u   (for all j)

9  

K-ary Sketch (cont��d) 

  • Estimate v(S, k): sum of updates for key k
 

compensate 
for signal loss
 

v(S, k) +  
noise
 

v(S, k)/K +  
E(noise)
 

boost 
confidence
 

unbiased estimator of v(S,k) with low variance 

1 

j 

H 

0 

1 

K-1 

�� 

�� 

�� 

hj(k) 

hH(k) 

h1(k)


10  

K-ary Sketch (cont��d) 

  • Estimate the second moment (F2
     
  • Sketches are linear
    • Can combine sketches 
       
       
    • Can aggregate data from different times, locations, and sources
 


=


11  

Forecast Model: EWMA 

  • Compute forecast error sketch: Serror
 

=                *a +             *(1-a) 

Sforecast(t) 

Sobserved(t-1) 

Sforecast(t-1) 

=                  - 

Serror(t-1) 

Sobserved(t-1) 

Sforecast(t-1) 

  • Update forecast sketch: Sforecast

12  

Other Forecast Models 

  • Simple smoothing methods
    • Moving Average (MA)
    • S-shaped Moving Average (SMA)
    • Non-Seasonal Holt-Winters (NSHW) 
  • ARIMA models (p,d,q)
    • ARIMA 0  (p  2, d=0, q  2)
    • ARIMA 1  (p  2, d=1, q  2)

13  

Find Big Changes 

  • Top N
    • Find N biggest forecast errors
    • Need to maintain a heap 
  • Thresholding
    • Find forecast errors above a threshold

14  

Evaluation Methodology 

  • Accuracy
    • Metric: similarity to per-flow analysis results
    • This talk focuses on
      • TopN (Thresholding is very similar)
      • Accuracy on real traces (Also has data-independent probabilistic accuracy guarantees)
  • Efficiency
    • Metric: time per operation
  • Dataset description
 

Netflow 

Data 

#records 

190M 

Total 

861K – 60M 

10 

4 hours 

Range 

#routers 

Duration


15  

Experimental parameters 

  50, 100, 500, 1000 


  10  (this talk: Large, Medium) 

Router 

  6 forecast models 

Model 

  1 min., 5 min. 

Interval 

  8K, 32K, 64K 


  1, 5, 9, 25 

H   

  Values 

Parameter 


16  

Accuracy 

H = 5,  K = 32768,  Router = Large 

Similarity = | TopN_sketch TopN_perflow | / N


17  

Accuracy (cont��d) 

Interval = 1 min. 

Interval = 5 min. 

Model = EWMA,  Router = Large


18  

Accuracy (cont��d) 

Router = Large 

Router = Medium 

Model = ARIMA 0,  Interval = 5 min.


19  

Accuracy Summary 

  • For small N (50, 100), even small K (8K) gives very high accuracy
  • For large N (1000), K = 32K gives about 95% accuracy
  • Router, interval, and forecast model make little difference
  • H generally has little impact

20  

Efficiency 

Nanoseconds  per operation 

146 

269 

Estimate cost 

45 

81 

Update cost 

89 

34 

Hash computation 

900MHz 
Ultrasparc-II 

400MHz 
SGI R12k 

Operation 
(H=5, K = 64K) 

Can potentially work in near real time. 

1 Gbps = 320 nsec per 40-byte packet


21  

  • Sketch-based change detection
  • Scalable 
     
     
    • Can handle tens of millions of time series
  • Accurate
    • Provable probabilistic accuracy guarantees
    • Even more accurate on real Internet traces
  • Efficient
    • Can potentially work in near real time
 

Conclusion 

Sketch 
module
 

Forecast 
module(s)
 

Change  
detection 
module
 

(k,u) �� 

Sketches 

Error 
Sketch
 

Alarms


22  

Ongoing and Future Work 

  • Refinements
    • Avoid boundary effects due to fixed interval
    • Automatically reconfigure parameters
    • Combine with sampling
  • Extensions
    • Online detection of multi-dimensional hierarchical heavy hitters and changes
  • Applications
    • Building block for network anomaly detection

23  

Thank you!


24  

Heavy Hitter vs. Change Detection 

  • Change detection for heavy hitters
    • Can potentially use different parameters for different flows
    • Heavy hitter ��  big change
      • Need small threshold to avoid missing changes
    • Aggregation is difficult
  • Sketch-based change detection
    • All flows share the same model parameters
    • Aggregation is very easy due to linearity
Search more related documents:Sketch-based Change Detection

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP