Projects

[Model Monitoring] ML 프로젝트 모델 모니터링

마메프 2022. 8. 10. 13:04

구글

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

Monitor 1: Dependency changes result in notification

How? Make sure that your team is subscribed to and reads announcement lists for all dependencies, and make sure that the dependent team knows your team is using the data.

Monitor 2: Data invariants hold in training and serving inputs

How? Using the schema constructed in test Data 1, measure whether data matches the schema and alert when they diverge significantly. In practice, careful tuning of alerting thresholds is needed to achieve a useful balance between false positive and false negative rates to ensure these alerts remain useful and actionable.

Monitor 3: Training and serving features compute the same values

How? To measure this, it is crucial to log a sample of actual serving traffic. For systems that use serving input as future training data, adding identifiers to each example at serving time will allow direct comparison; the feature values should be perfectly identical at training and serving time for the same example. Important metrics to monitor here are the number of features that exhibit skew, and the number of examples exhibiting skew for each skewed feature.

Monitor 4: Models are not too stale

How? For models that re-train regularly (e.g. weekly or more often), the most obvious metric is the age of the model in production. It is also important to measure the age of the model at each stage of the training pipeline, to quickly determine where a stall has occurred and react appropriately.

Monitor 5: The model is numerically stable

How? Explicitly monitor the initial occurrence of any NaNs or infinities. Set plausible bounds for weights and the fraction of ReLU units in a layer returning zero values, and trigger alerts during training if these exceed appropriate thresholds.

Monitor 6: The model has not experienced a dramatic or slow-leak regressions in training speed, serving latency, throughput, or RAM usage

How? While measuring computational performance is a standard part of any monitoring, it is useful to slice performance metrics not just by the versions and components of code, but also by data and model versions. Degradations in computational performance may occur with dramatic changes (for which comparison to performance of prior versions or time slices can be helpful for detection) or in slow leaks (for which a pre-set alerting threshold can be helpful for detection)

Monitor 7: The model has not experienced a regression in prediction quality on served data

How? Here are some options to make sure that there is no degradation in served prediction quality due to changes in data, differing codepaths, etc.

ML 관련	Ops 관련
Input Data Distribution	Request Latency
Feature Distribution	Request Error Rate
Output Data distribution	CPU, Memory Utilization
Performance(Evaluation)	Disk I/O
Model Stability	Network Traffic
...	...

+) 구글이 제시한 전통 SW 모니터링 지표는 다음 4가지!

Latency - 사용자 요청이 응답을 받기까지 걸리는 시간
Traffic - 시스템이 처리해야하는 총 트래픽
Errors - 사용자의 요청 중 실패한 비율
Saturation - 시스템의 포화상태

저작자표시 비영리 변경금지

'Projects' 카테고리의 다른 글

[Git] EasyOCR 뜯어보기. (0)	2022.03.10

현재글[Model Monitoring] ML 프로젝트 모델 모니터링

2021.05.11 ~

파이썬, Model Monitoring, 인디게임, Min-heap, 인디게임개발, docker, 자료구조, hash table, Max-heap, fine-tuning, Python, 데브로그, mp, 이진 힙 트리, python3, Selenium, 크롤링, pytorch, 게임개발일지, yolov7,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

MintPsycho's Blog

[Model Monitoring] ML 프로젝트 모델 모니터링

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

'Projects' 카테고리의 다른 글

'Projects'의 다른글

티스토리툴바

[Model Monitoring] ML 프로젝트 모델 모니터링

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

'Projects' 카테고리의 다른 글

'Projects'의 다른글

관련글

티스토리툴바