Projects

[Model Monitoring] ML 프로젝트 모델 모니터링

마메프 2022. 8. 10. 13:04
반응형

구글

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

 

Monitor 1: Dependency changes result in notification

How? Make sure that your team is subscribed to and reads announcement lists for all dependencies, and make sure that the dependent team knows your team is using the data.

 

Monitor 2: Data invariants hold in training and serving inputs

How? Using the schema constructed in test Data 1, measure whether data matches the schema and alert when they diverge significantly. In practice, careful tuning of alerting thresholds is needed to achieve a useful balance between false positive and false negative rates to ensure these alerts remain useful and actionable.

 

Monitor 3: Training and serving features compute the same values

How? To measure this, it is crucial to log a sample of actual serving traffic. For systems that use serving input as future training data, adding identifiers to each example at serving time will allow direct comparison; the feature values should be perfectly identical at training and serving time for the same example. Important metrics to monitor here are the number of features that exhibit skew, and the number of examples exhibiting skew for each skewed feature.

 

Monitor 4: Models are not too stale

How? For models that re-train regularly (e.g. weekly or more often), the most obvious metric is the age of the model in production. It is also important to measure the age of the model at each stage of the training pipeline, to quickly determine where a stall has occurred and react appropriately.

 

Monitor 5: The model is numerically stable

How? Explicitly monitor the initial occurrence of any NaNs or infinities. Set plausible bounds for weights and the fraction of ReLU units in a layer returning zero values, and trigger alerts during training if these exceed appropriate thresholds.

 

Monitor 6: The model has not experienced a dramatic or slow-leak regressions in training speed, serving latency, throughput, or RAM usage

How? While measuring computational performance is a standard part of any monitoring, it is useful to slice performance metrics not just by the versions and components of code, but also by data and model versions. Degradations in computational performance may occur with dramatic changes (for which comparison to performance of prior versions or time slices can be helpful for detection) or in slow leaks (for which a pre-set alerting threshold can be helpful for detection)

 

Monitor 7: The model has not experienced a regression in prediction quality on served data

How? Here are some options to make sure that there is no degradation in served prediction quality due to changes in data, differing codepaths, etc.

 

ML 관련 Ops 관련
Input Data Distribution Request Latency
Feature Distribution Request Error Rate
Output Data distribution CPU, Memory Utilization
Performance(Evaluation) Disk I/O
Model Stability Network Traffic
... ...

 

+) 구글이 제시한 전통 SW 모니터링 지표는 다음 4가지!

  1. Latency - 사용자 요청이 응답을 받기까지 걸리는 시간
  2. Traffic - 시스템이 처리해야하는 총 트래픽
  3. Errors - 사용자의 요청 중 실패한 비율
  4. Saturation - 시스템의 포화상태
반응형

'Projects' 카테고리의 다른 글

[Git] EasyOCR 뜯어보기.  (0) 2022.03.10