Production Monitoring Platform (pMp): detailed guide

pMp

pMp stands for Production Monitoring Platform. pMp provides monitoring of the Monte Carlo, ReReco and RelVal production.

Contributors

The original code is based on Igor Jurkowski and Jean-Roch Vlimant contributions. Further developed by Adrian Alan Pol, Antanas Norkus, Giovanni Franzoni and Jacob Walker. Second Edition is done by Justinas Rumsevicius.

Data in pMp

pMp collects data from McM (Monte Carlo Management service) and Stats (Stats service, PdmV's cache of RequestManager2) and displays it in graphs, histograms and tables. pMp enables users to search for information about event production based on request ids, campaigns, datasets, tags, etc. Returned results can be additionally filtered by priority, physics working group and status. Data is periodically updated from McM and Stats service using Jenkins job. Responses in pMp is cached for a short period of time, initially - 10 minutes, but this is easily configurable and may be changed at any time.

Search in pMp

Search bar in pMp can be found in all statistics pages. Next to it, there are two buttons: Show and Append. Start typing in search bar to see suggestions. For the sake of easier explanation, words entered in search bar will be called search terms. Show button clears all existing search terms (if any) and adds search term that is currently in search bar. Append button appends search term that is in search bar to existing search terms (if any). pMp allows users to search for information by:

  • Campaigns (McM)

  • Request prepids (McM)

  • PPD tags (McM)

  • Request tags (McM)

  • Flows (McM)

  • Dataset names (McM)

  • ReReco request prepids

  • ReReco processing strings

  • ReReco campaigns

  • RelVal request prepids

  • RelVal CMSSW versions

  • RelVal campaigns

Suggestions in search are shown as prepids of objects in the list above. pMp does not have wildcard (asterisk - *) search, so queries must contain complete tags, campaigns, prepids, etc, i.e. search terms must be full identifiers of objects. If multiple objects in different indexes (of different types) have the same identifier, i.e. there is a PPD tag TAG1 and a request tag TAG1 then pMp fill treat TAG1 as PPD tag and fetch only requests that have TAG1 as a PPD tag because PPD tags are higher in the list above. Objects in pMp are stored in indexes - databases for each type. Once user enters a search term, pMp first needs to figure out which type of object is user searching for and then look into appropriate index in Elasticsearch. List above shows order in which indexes are checked for an object with given id to determine object type (whether it's a campaign, PPD tag, tag, CMSSW version or etc.). Search in pMp is case-sensitive.

If multiple search terms are given then requests that fit any of these terms are shown, in other words, terms in search are joined with OR. For example if user searches by TAG1 and TAG2 then requests that have TAG1 or TAG2 (or both) are shown. If a request fits multiple terms in the search, it is still added to results only once.

Present Statistics

Present statistics in pMp shows current status of requests. Users can choose histogram scale - Linear or Logarithmic. Histogram Y values can show either number of requests, number of total events* or seconds which is calculated by multiplying time per event and number of total events*. Bars in histogram can be grouped, colored and stacked by different attributes of requests. More information about any histogram bar can be seen by hovering mouse cursor over bars in histogram. User can see which requests are in the bar by clicking on it in the histogram.

At the bottom there are three tables. First one shows requests that are at least in one chained request (attribute member_of_chains). Second table shows requests that are not members of any chained request. Third table shows all requests - chained and unchained. It is possible to switch table mode between number of requests, total events* and seconds by changing Mode of the histogram. Last column in table shows sum of all numbers in that row. If there is more than one campaign in a table, last row, called All will show sum of all numbers in that column.

* except if request is done, then number of completed events is used

Historical Statistics

Historical statistics in pMp show how number of events grew over time. Only requests with status submitted or doneare shown in this plot. There are three colors in graph - gray, orange and blue. Gray area represent requested (expected) events. Orange area represent events that are produced, but the dataset is not yet in 'VALID' state (not completed). Blue area represent events that are produced and dataset is 'VALID'. Usually gray area is a bit higher than orange and blue area. In the end, blue area should be the same height as orange area. If request is force completed (it's workflow has force-complete in request transitions) then number of expected events for that request is set to done events (blue value). User can use mouse wheel to zoom in and out and drag graph to move it.

Below the main plot there are two tables* that show all requests that were used to produce said plot. Requests that were force completed will have '(FC)' next to done events in table of done requests.

* second table can be turned on by checking 'Show list of done requests' option

Performance Statistics

Performance statistics in pMp show how much time it took for requests to get from one status to another. All data is grouped in equal width (number of seconds) bins. Lowest value in x axis is the smallest amount of time of all requests to change status while highest value is the largest amount of time. User can see which requests are in the bar by clicking on it in the histogram. Users can choose histogram scale - Linear or Logarithmic. If there are less requests than bins (bars), then number of bins (bars) is reduced to number of requests.

Options in pMp

Display chains (only in Present Statistics)

Fetch all campaigns from all chained campaigns that contain campaigns of all requests for user given search query. In other words: fetch all requests for user given search term. Take all campaigns from there requests. Find all chained campaigns for these campaigns. Take all campaigns from these chained campaigns. Fetch results for all these campaigns from chained campaigns. This may return more results than expected and can be very slow. Results may conatain up to tens of thousands of requests.

Default value: false URL parameter: chainedMode

Growing mode (only in Present Statistics)

For each submitted request create a corresponding "fake" request with same prepid that would have status done and number of done events would be number of produced events in submitted request. For example there is a submittedrequest that has 1000 expected events of which 200 are already produced. Then this request would have 800 total events (this is show as "Left" in table) and there would be another request with same prepid and status done that would have 1000 expected events and 200 done events. This option works only if plot mode is set to "Events".

Default value: false URL parameter: growingMode

Using SI suffix for large numbers

This option changes large numbers to smaller numbers with SI suffix. 1 000 becomes 1k, 1 000 000 becomes 1M, 1 000 000 000 becomes 1G. Up to two digits are shown after decimal point, for example 1234 is 1.23k, 123456 is 1.23M, 123456789 is 1.23G. If this option is enabled, actual value of number can be seen by hovering mouse cursor over shortened number.

Default value: true URL parameter: humanReadable

Estimate completed events (only in Present and Historical Statistics)

If request does not have output dataset, then try to guess how many events are produced based on subsequent requests in chain. pMp finds all chains where this request is present and finds another request down the chain which has the biggest number of completed events and uses dataset of that request as source of information.

Default value: false URL parameter: estimateCompleted

Zoom both axes (only in Historical Statistics)

If this option is enabled, zoom happens not only in x axis, but in y axis as well.

Default value: false URL parameter: zoomY

Show list of done requests (only in Historical Statistics)

If this option is enabled, a second list is shown in Historical statistics that contains all requests that are in done status.

Default value: false URL parameter: showDoneRequestsList

Granularity (only in Historical Statistics)

This option controls how many data points are in the graph, that is, how many different x values are there.

Default value: 100 URL parameter: granularity

Bins (only in Performance Statistics)

This option controls how many bins (bars) are in histogram. If number of requests is lower than number of bins, then number of requests is used rather the number of bins.

Default value: 20 URL parameter: bins

Filtering in pMp

Priority filter

Status filter

PWG filter

Sharing in pMp

Download an image

Database structure (for advanced users)

All data is taken from McM and Stats service and stored in pMp's Elasticsearch index.

Below is structure of objects kept in Elasticsearch.

Campaigns

Index - campaigns. Document type - campaign. Source - campaigns are fetched from McM.

Attributes:

  • prepid - prepid of campaign

Chained Campaigns

Index - chained_campaigns. Document type - chained_campaign. Source - chained campaigns are fetched from McM.

Attributes:

  • campaigns - list of lists which contain two elements: first element shows campaign and second one shows flow

  • prepid - prepid of chained campaign

Chained Requests

Index - chained_requests. Document type - chained_request. Source - chained requests are fetched from McM.

Attributes:

  • chain - list of request prepids that form the chain

  • member_of_campaign - name of chained campaign that this chained request is member of

  • prepid - prepid of a chained request

Dataset Names

Index - mcm_dataset_names. Document type - mcm_dataset_name. Source - dataset names are extracted from requests (attribute dataset_name) when pMp is fetching McM requests.

Attributes:

  • prepid - dataset name

Flows

Index - flows. Document type - flow. Source - flows are fetched from McM.

Attributes:

  • prepid - prepid of flow

PPD Tags

Index - ppd_tags. Document type - ppd_tag. Source - PPDtags are extracted from requests (attribute ppd_tags) when pMp is fetching McM requests.

Attributes:

  • prepid - tag itself

Processing strings

Index - processing_strings. Document type - processing_string. Source - processing strings are extracted from ReReco requests (attribute processing_string).

Attributes:

  • prepid - processing string itself

RelVal Campaigns

Index - relval_campaigns. Document type - relval_campaign. Source - RelVal campaigns are extracted from RelVal requests (attribute member_of_campaign).

Attributes:

  • prepid - prepid of RelVal campaign

RelVal CMSSW Versions

Index - relval_cmssw_versions. Document type - relval_cmssw_version. Source - RelVal CMSSW versions are extracted from RelVal requests (attribute cmssw_version).

Attributes:

  • prepid - CMSSW version itself

RelVal Requests

Index - relval_requests. Document type - relval_request. Source - RelVal requests are created when pMp is fetching workflows. If attribute request_name in workflow contains _RVCMSSW_ and prepid contains CMSSW_ then in addition to workflow, a RelVal request is created. They are created using information in workflow and they appear and have structure of a normal McM request with a few exceptions such as tags replaced by processing strings, no flow, no tags, no time per event and no member of chain.

Attributes:

  • cmssw_version - CMSSW version from workflow

  • history - list of dictionaries that have keys 'action' (value is string) and 'time' (value is int - unix timestamp) that represent actions as they would appear in McM ('submitted', 'done'). This is taken from 'reqmgr_status_history' and 'new' in ReqMgr2 will be 'submitted', 'announced' will be 'done'.

  • member_of_campaign - first campaign from ReqMgr2 workflow's list of campaigns

  • output_dataset - list of datasets that this request will produce. Only last one is used

  • prepid - prepid of rereco request

  • priority - priority of request's workflow in ReqMgr2

  • pwg - constant value 'RelVal'

  • reqmgr_name - list of one element which is workflow name that was used to create this request

  • reqmgr_status_history - list of dictionaries that have keys 'history' (value is list) and 'name' (value is string). 'history' contain list of workflow statuses from ReqMgr2 ('new', 'assignment-approved', 'announced', etc). 'name' is workflow name.

  • status - 'action' of last record in 'history' of request ('submitted', 'done', etc.)

  • total_events - number of total events of request

ReReco Campaigns

Index - rereco_campaigns. Document type - rereco_campaign. Source - ReReco campaigns are extracted from ReReco requests (attribute member_of_campaign).

Attributes:

  • prepid - prepid of ReReco campaign

ReReco Requests

Index - rereco_requests. Document type - rereco_request. Source - ReReco requests are created when pMp is fetching workflows. If attribute request_type in workflow is ReRecothen in addition to workflow, a ReReco request is created. They are created using information in workflow and they appear and have structure of a normal McM request with a few exceptions such as tags replaced by processing strings, no flow, no tags, no time per event and no member of chain.

Attributes:

  • history - list of dictionaries that have keys 'action' (value is string) and 'time' (value is int - unix timestamp) that represent actions as they would appear in McM ('submitted', 'done'). This is taken from 'reqmgr_status_history' and 'new' in ReqMgr2 will be 'submitted', 'announced' will be 'done'.

  • member_of_campaign - first campaign from ReqMgr2 workflow's list of campaigns

  • output_dataset - list of datasets that this request will produce. Only last one is used

  • prepid - prepid of rereco request

  • priority - priority of request's workflow in ReqMgr2

  • processing_string - processing string of a request

  • pwg - constant value 'ReReco'

  • reqmgr_name - list of one element which is workflow name that was used to create this request

  • reqmgr_status_history - list of dictionaries that have keys 'history' (value is list) and 'name' (value is string). 'history' contain list of workflow statuses from ReqMgr2 ('new', 'assignment-approved', 'announced', etc). 'name' is workflow name.

  • status - 'action' of last record in 'history' of request ('submitted', 'done', etc.)

  • total_events - number of total events of request

Requests

Index - requests. Document type - request. Source - requests are fetched from McM.

Attributes:

  • dataset_name - name of dataset from McM

  • flown_with - name of flow that request is associated with

  • history - list of dictionaries that have keys 'action' (value is string) and 'time' (value is int - unix timestamp) that represent actions as they appear in McM ('created', 'approved', 'submitted', 'done', etc.)

  • member_of_campaign - name of campaign that this request is member of

  • member_of_chain - list of chained requests that this request is member of

  • output_dataset - list of datasets that this request will produce. Only last one is used

  • ppd_tags - list of PPD tags

  • prepid - prepid of request

  • priority - priority of request as it appears in McM. Note that workflow priority of resubmissions may be higher

  • pwg - physics working group of request

  • reqmgr_name - list of workflow names associated with this request

  • reqmgr_status_history - list of dictionaries that have keys 'history' (value is list) and 'name' (value is string). 'history' contain list of workflow statuses from ReqMgr2 ('new', 'assignment-approved', 'announced', etc). 'name' is workflow name.

  • status - 'action' of last record in 'history' of request ('submitted', 'done', etc.)

  • tags - list of tags that request has

  • time_event - list of numbers that represent time per event for request sequences

  • total_events - number of total events of request

Tags

Index - tags. Document type - tag. Source - tags are extracted from requests (attribute tags) when pMp is fetching McM requests.

Attributes:

  • prepid - tag itself

Workflows

Index - workflows. Document type - workflow. Source - workflows are fetched from Stats.

Attributes:

  • event_number_history - list of dictionaries. Each dictionary has 'dataset' (value is string) and 'history' (value is list). Each history list entry is a dictionary that has 'events' (value is int), 'time' (value is int - unix timestamp) and 'type' (value is string). 'events' show number of events at 'time' time and 'type' shows dataset status at given 'time' ('NONE', 'PRODUCTION', 'VALID')

  • output_datasets - list of output datasets

  • prepid - workflow prepid

  • processing_string - processing string

  • request_name - workflow name

  • request_transition - list of dictionaries that have keys 'status' (value is string) and 'time' (value is int - unix timestamp). Status changes from ReqMgr2 ('new', 'assignment-approved', 'announced', etc).

  • request_type - request type ('TaskChain', 'Resubmission', 'ReReco', etc.)

  • total_events - number of expected events

Screenshots (for advanced users)

pMp2 allows to download graphs as PDF, PNG or SVG files. In order for this function to work, these two things need to be installed:

Last updated