is the total platform capacity known, but also the utilization of
individual LPARs is measured in ASUM70PR.
The problem arises when CPUs are Dedicated to an LPAR, or when Wait
Complete = Yes is used, because the dispatch time in those cases is
NOT equal to the CPU executing time. While a dispatch time of one
hour does mean that one hour of total platform capacity was used by
an LPAR, (i.e., not available to other LPARs), the actual CPU time
used by that LPAR may be a lot less than one hour. What we need is
the Wait time measured inside each MVS system, which is in the MVS
TYPE70 dataset, but each type 70 record only has a single TYPE70
segment (for the LPAR in which this MVS System executed); we do not
get a TYPE70 segment for the other LPARs. But MXG does store the
MVS Wait Time from the TYPE70 segment into variable ORIGWAIT in the
TYPE70PR observation for each LCPUADDR, which shows this data:
Wait Complete = YES example: System SYSC (LPARCPUS=2 PARTNCPU=4)
LPARNUM=PARTISHN=2
LCPU=0 LCPU=1
DURATM=15 min DURATM=15 min
|---------------------------------|-------------------------------|
8 min 7 min 15min
|--------------------|------------|-------------------------------|
Dispatched LPAR Wait Dispatched
LCPUPDTM 70PR calc LCPUPDTM 70PR
5 min 3 min 7 min 11 min 4 min
|----------|=========|------------|---------------------|=========|
ORIGWAIT BUSY LPAR Wait ORIGWAIT BUSY
70 calc calc 70 calc
This LPAR has two LCPUs, Wait Complete=Yes, but due to the other
LPAR on this platform (that was also using Wait Complete=Yes), the
LCPU=0 was dispatched for only 8 minutes of the 15 minute interval,
while LCPU=1 was dispatched for all 15 minutes. The ORIGWAIT from
TYPE70 shows that LCPU=0 was actually CPU Busy for only 3 minutes,
and LCPU=1 was actually CPU Busy for only 4 minutes.
While there are only two LCPUs for this LPAR, this LPAR is in a
platform that has four engines, so the ASUM70PR calculation is:
PCTL2BY = (8 disp + 15 disp )/ (4*15) = 23/60 = 38%
because 38% of the dispatch capacity of the four engines in the
hardware platform was consumed by this LPAR in this interval.
However, RMF in its CPU Activity Report calculates two percentages
(and MXG replicates in both TYPE70 and TYPE70PR data):
PCTCPUBY = "LPAR Busy Time" = (3 busy + 4 busy) / (2 * 15) = 23%
PCTMVSBY = "MVS Busy Time" = (10 busy+lparwait + 4 busy)/30 = 48%
The "LPAR Busy Time" shows that this LPAR was busy for 7 of the 30
minutes that the two engines in the LPAR could have been executing,
and thus is a measure of how busy the MVS system might have been.
However, the "MVS Busy Time" calculated by IBM is at best confusing
and at worst wrong, for Wait Completion = Yes LPARs, because it
calculates the MVS busy time as DURATM minus ORIGWAIT, adding the 3
minutes busy and 7 minutes of LPAR wait from LCPU=0 to the 4 minutes
busy from LCPU=1 to conclude 14 minutes of "busy time" out of the
30 minutes that the two engines could have been executing, for 48%!
But the MVS SRM never saw those possible 30 minutes of execution; it
was dispatched for only 8 + 15 = 23 minutes, so a far more accurate
measure is "SRM Busy Time", the busy time over the dispatched time:
PCTSRMBY = "SRM Busy Time" = (3 busy + 4 busy) / 23 (dispatch) = 30%
which more accurately reflects what MVS can do with Wait Comp=Yes,
and it strongly suggests that the IBM "MVS Busy Time" is wrong for
Wait Comp=Yes. Note: Jan 2006: Using WAITCOMP=YES is no longer an
issue; only the early AMDAHL implemented that option, as I recall.
(The example used the Partition Dispatch times, but to be
slightly more precise, using the Effective Dispatch times would
show what was delivered to MVS. I am still deciding if I should
create a new variable for PCTSRMBY, but want to send this
preliminary note to MXG-L, so I will update this part of this
note at a later date.)
Dedicated example: System SYSA (LPARCPUS=3 PARTNCPU=4)
LCPU=0 Dedicated, Wait=No
LCPU=1,2 Shared, Wait=No
LPARNUM=PARTISHN=5
LCPU=0
DURATM=15 min DURATM=15 min
|---------------------| |------------------------------------|
LCPU=1
14:59.20 5:48.92 8:25.73 0:45.35
|---------------------| |===============|---------|----------|
Dispatched Dispatched ORIGWAIT Non-Disp
LCPUPDTM 70PR LCPUPDTM 70PR 70 Non-Wait
BUSY calc
LCPU=2
3:11.51 11:48.49 5:49.20 8:25.41 0:45.39
|----------|==========| |===============|---------|----------|
ORIGWAIT BUSY Dispatched ORIGWAIT Non-Disp
70 calc LCPUPDTM 70PR 70 Non-Wait
BUSY calc
For all the three LCPUs in this LPAR, MXG calculates in ASUM70PR:
PCTL5BY = 100* ( 26.5 / 4*15) = 100 * 26.5 /60 = 44.37%
because the total dispatch time of the three LCPUs was 26.5 minutes
of the possible 60 minutes of dispatch time in the four engines of
the platform, and this is this LPAR's use of dispatch capacity.
But if we have the TYPE70PR observation from the system that has the
ORIGWAIT measurement from TYPE70 for that dedicated LCPU, we can see
the LPAR's total CPU busy time was only 11:48 + 5:48 + 5:49, or 22.5
minutes, since 3 minutes of that dispatch time was in MVS wait time!
The IBM RMF calculations for each LCPU and the total for all three
LCPUs in this LPAR show:
LCPU PCTCPUBY (calc) PCTMVSBY (calc) Status
0 78.72 (11:48/15) 78.72 (11:48/15) Ded,Wait=No
1 38.77 ( 5:48/15) 43.81 ( 6:33/15) Shr,Wait=No
2 38.80 ( 5:49/15) 43.84 ( 6:34/15) Shr,Wait=No
all 52.10 (23:17/45) 55.46 (24:55/45) Combined
For the Dedicated LCPU, both PCTCPUBY and PCTMVSBY are calculated
PCTCPUBY=PCTMVSBY= 100*(DURATM-ORIGWAIT)/DURATM = 78.7%
PCTMVSBY=PCTCPUBY= 100*(DURATM-ORIGWAIT)/DURATM = 78.7%
For the Shared, Non-Wait LCPUs, the "Lpar Busy Time" is
PCTCPUBY= 100*LCPUPDTM/DURATM = 38.7%
but the IBM calculation for the "MVS Busy Time" is
PCTMVSBY= 100*(DURATM-ORIGWAIT)/DURATM = 43.8%
because the PCTMVSBY value includes the 45 seconds of non-dispatched
non-wait time recorded in the MVS Busy Time calculation!
Again, while PCTCPUBY is legitimate, PCTMVSBY raised more questions
than it answers, initially. Note: Jan 2006: However, now it is
used to calculate the SHORTCPS variable, and is thus useful.
To summarize what percentages are printed where by IBM and reported
where by MXG, on RMF CPU Activity Report, the "LPAR Busy Time Perc"
is variable PCTCPUBY, and the "MVS Busy Time Perc" is variable
PCTMVSBY in dataset TYPE70 (and now in TYPE70PR as well). On RMF's
Partition Data Report, IBM's "Logical Processors Total" is variable
LPCTnBY, and IBM's "Physical Processors Total" is PCTLnBY in dataset
ASUM70PR for each LPAR, and the "Physical Processors Total" is the
variable PCTCPBUY in ASUM70PR.
Note: I intend to revise this note as I learn more, especially for
millennium and/or MDF, in the near future. The purpose of this
much of the note was to document what is calculated by MXG and by
IBM when you try to compare RMF reports to MXG datasets, and to
point out basic problems if you have Dedicated or Wait Comp = YES.
Not only is there a problem in ASUM70PR in that we do not know the
true CPU busy time, we also have assumed the "capacity" was the
DURATM of the interval, but that is not always the case, especially
when LPAR weighting is taken into account. No single percentage
value can be used, as it depends on your perspective. ASUM70PR
reports usage percentages of the "dispatch" capacity, while TYPE70
still must be used to understand what is happening inside each MVS.
2. FAT32 file system reduces space needed for MXG from 139MB to 68MB.
On Windows 95 and Windows NT with FAT File Systems, the MXG Source
Library directory DIR command shows 3549 files totaling 57.7 MB,
but the files in that directory actually required 139.1 Megabytes
of disk space! The 2GB disk drive with 32K cluster size wastes
space if the file is less than 32KBytes, and as only 272 of MXG's
source files are over 32K in size, the other 3277 small files waste
lots of disk space with large cluster size under FAT file systems.
Well that is a dead problem with the newer FAT32 file system that
virtually eliminates the space waste problem. That same source
library required only 68.23 MegaBytes on a 9GB FAT32 disk drive!
III. MVS Technical Notes.
1. APAR OW25609 corrects a stoppage of SMF type 30 interval records
(subtypes 2 & 3) and type 23 records, after a serialization problem.
The APAR applies to MVS/4.3 thru OS/390 2.4.
2. APAR OW28289 changes counts in type 30 variables TAPNMNTS/TAPSMNTS
(SMF30PTM/SMF30TPR). In DF/SMS 1.2 and earlier, tape mount counts
were the number of physical mounts (actually, a count of volumes
that were verified by OPEN/CLOSE/EOV via a loadpoint read of the
VOL1 tape label). That was changed by an SPE to DF/SMS 1.2.0 (which
was included in DF/SMS 1.3.0 and 1.4.0); IBM decided instead to
count logical volumes (i.e., increment the mount count when OPEN
processing is entered with the tape drive in a ready state and with
the mounted volume at loadpoint). A document change was prepared
but never distributed, and now IBM is backing out the SPE's effect,
and with this APAR, the counts revert to physical mount counts. The
APAR's text is confusing, because it lists PTFs for DF/SMS releases
1B0, 1C0, and 1D0, which turn out to be DFSMS 1.2, 1.3, and 1.4,
respectively. If you depend on the count of tape mounts in type 30
records, you will want to apply this PTF.
3. APAR OW28613 corrects errors in the JES2 Type 26 Purge record in the
SMF26OAG Accounting Section offset. I earlier thought MXG would not
fail, but without that APAR, MXG offset validation was insufficient,
an INPUT STATEMENT EXCEEDED occurs. Now, Change 15.330 circumvents
the wrong value for SMF26OAG, but the ACCOUNTn fields in TYPE26J2
will be blank until you install to APAR to correct IBM's error.
Fortunately, MXG only uses the TYPE26J2 ACCOUNTn fields for jobs
that do not produce type 30s (JCL Errors or Cancel before start).
4. APAR OW28256 reports invalid CPU times measured (once again!) in RMF
type 72 field SMF72RCT (MXG Variable CPURCTTM, which is summed into
variable CPUTM); PTF was available November 14 1997. This causes
the total CPU time captured in type 72 records to exceed the total
CPU busy time, causing the Uncaptured CPU time (misnamed as CPUOVHTM
and labeled as "Overhead") to be negative in RMFINTRV. This same
field was in error in 1992, fixed then by APAR OY51878. MXG now
detects the negative value and prints this error message on the log:
"ERROR. NEGATIVE CPU-UNCAPTURED-TIME (TYPE70-TYPE72)".
See text of Change 15.238 for more details.
5. APAR OW26619 for OS/390 V2.4, in Goal Mode corrects WLM errors found
by IBM during final function test, and corrects SMF values.
6. APAR OW26421 for OS/390 V1.3 is needed only for ASMTAPES. In OS/390
IBM created two 4-byte fields for Y2K support to replace the 3-byte
fields JCTSSD and JCTJMRJD (step and job start/init dates), but I
missed that change, so ASMTAPES still used the 3-byte fields. But
IBM also zeroed the 3-byte fields, which caused INVALID DATA when
TYPETMNT was executed, and variable INITTIME has missing value.
This APAR restores the dates in the 3-byte fields, so INITTIME will
not be missing. The ML-15 of the MXG ASMTAPES avoids the exposure
by using the 4-byte fields if they are present.
7. SYNCSORT 3.6 can ABEND 0C9 during a PROC SORT; SYNCSORT fix SY49930
is the correction.
8. APAR OW30153 corrects type 30 Measured Usage (MULC) segments. There
are multiple occurrences of the same product name and qualifier for
PRODNAME=CICS PRODQUAL=DFHKETCB in the interval records that should
have had only a single segment. There are still other errors that
are not addressed in creating the subtype 4 and subtype 5 records
from the interval records. One CICS job had 39 DHFKETCB segments in
its interval records (subtype 2 and 3), but had 37 segments in its
step termination record (subtype 4) and then had only 36 segments in
its job termination record (subtype 5). Further, the job had 12
DFHSIP segments in the interval records but had 16 segments in both
step and job terminate. Finally, the job had 2 DFHDUP segments in
the job term but none in either the interval or step term records.
A new problem has been opened with IBM on this error.
Note that old APAR OW16176, which consolidates MULC sections for
each product, should be installed. Increasing SMF buffers with
APAR OW12836 is also recommended to minimize the problems with SMF
buffers, and especially specification of DDCONS=NO in SMFPRMxx in
SYS1.PARMLIB is strongly recommended to eliminate the SMF address
algorithm to consolidate DD segments.
Note added Dec 30, 1997:
APAR PN80497 corrects a problem after applying UN84065 with Measured
Usage (MULC) that can create millions of type 30 subtype 3 records
with the same product name in the MULC segment. The problem
occurred with an IMS BMP that used MQ Series. The excess records
could cause IEE979W SMF DATA LOST - NO BUFFER SPACE AVAILABLE.
9. APAR OW30059 (PTF available 12Dec97) reports type 42 values for
Direct Write and Direct Read SMF42DWB/SMF42DRB and this APAR is
likely the fix that was originally described in note 26 in MVS
Technical Notes in MXG Newsletter THIRTY-TWO for APAR OW20926.
When the channel program did single CI reads and writes, residual
data was left in the counter that was not used.
10. APAR PQ09396 (Target 26Dec97) for MQSERIES SMF type 116 reports
inconsistencies between 115 and 116 record's statistics.
11. APAR PQ09083 is for subtype '51'x of the FTP SMF record (VMACFTP).
The text mentions SMF Record Type 51, but there is no type 51 SMF
record (yet). The APAR corrects missing values in variables
DVGSETME/DVGSEDTE in dataset FTP51X.
12. Job Accounting for Started Tasks became available with MVS/ESA 5.1,
because you can now have a JOB card in the JCL for your STC's, and
can put ACCOUNT parameters in that JOB card that show up in MXG's
ACCOUNTn variables in PDB.JOBS/PDB.PRINT/PDB.STEPS datasets. The
JCL Reference Manual Sections 7.2, 7.3, and 16.7 discuss how.
13. What happens to measurements if I have a Y2K Test System in an LPAR?
You can use the ASUM70PR dataset and select the observations from
your production LPAR (SYSTM='PROD') to measure the Y2K Partition's
resources, since the STARTIME of the records with SYSTEM='PROD'
will be your local time of day.
All of the records written on SYSTEM='Y2K' will have the year 2000
dates (although the READTIME value could be earlier if jobs were
read into the hold queue before IPLing with year 2000). Since the
Y2K system will be re-IPLed repetitively with the same start value
(probably 31DEC99:23:45:00), RMF interval data will appear to have
duplicate data and the jobs/steps from all IPLs will be jumbled
together, because MXG sorts RMF data by STARTIME and job data by
READTIME.
You can extract SYSTEM='Y2K' data for a specific "test run" by
finding the record number (_N_) of each SMF IPL record, using:
%INCLUDE SOURCLIB(VMACSMF);
DATA _NULL_;
_SMF;
IF ID=0 THEN PUT 'IPL RECORD FOUND ' _N_= SMFTIME=;
and then use the record number of the specific IPL to select only
the SMF records desired. If you wanted the third run, and the third
IPL record had _N_=8,000 and the next IPL record had _N_=10,000, you
would use this logic:
%INCLUDE SOURCLIB(VMACSMF);
DATA _NULL_;
_SMF;
FILE SMFOUT DCB=SMF;
IF 8000 LE _N_ LE 9999 THEN PUT _INFILE_;
IF _N_ EQ 10000 THEN STOP;
to write to //SMFOUT DD only those records for that test run.
There is an alternative. You can use the IPL PROMPT feature to
require the operator to reply with the (local) time and the reason
(describe the test run) for each IPL, and there will be a SUBTYPE=8
observation in dataset TYPE90 with variables DTIME and IPLREASN with
the operator's reply, so the TYPE90 dataset can be used to identify
the records in each test run (variable SMFRECNR, equal to _N_, was
added to the TYPE90 dataset by Change 15.267).
You must have specified PROMPT(IPLR) or PROMPT(ALL) in member
SMFPRMxx in SYS1.PARMLIB dataset to prompt the operator for the
reply at each IPL.
14. Almost-Duplicate TYPE74 records, differing only by one second in the
STARTIME, can be written by Boole & Babbage's CMF Product, if both
IPM and CPM modes are enabled. This has happened recently as sites
installed OS/390. In MXG's TYPE7xxx datasets, variable PRODUCT will
be 'CMF-IPM' in one almost-duplicate record, and 'CMF-CPM' in the
other observation. Boole does NOT recommend both modes!
15. Channel Type variable CHANTYPE in dataset TYPE73 still exists, but
variable SMF73CPD provides a better description as it describes both
ESCON and Parallel Channel types. SMF73CPD was new in MVS/ESA 5.1.
16. APAR OW27855 corrects PSF/MVS-written type 6 SMF records so that
they now contain the node number of the current node in field
SMF6ROUN, which MXG decodes into variable NODE and RMOTID in TYPE6
dataset.
17. APAR OW20844 enables JES2 job numbers greater than 32000, but has
no impact on MXG, since MXG has supported 5-digit JES Numbers thru
99999 from the JCTJOBID for several years.
IV. DB2 Technical Notes.
1. There are no DB2 Technical Notes in this newsletter.
V. IMS Technical Notes.
1. Support for Boole's IMF 3.2 (for IMS 6.1) was added in MXG 15.09.
Candle has not informed me of any changes in their ITRF product.
2. Discussion of IMS Log support in MXG Software.
I strongly recommend you use an IMS monitor (Boole or Candle)
that creates a transaction record, rather than attempt to use
IBM's IMS log for transaction response and resource measurement.
See MXG newsletter TWENTY-FIVE, IMS Technical Notes, for the MXG
position statement of the technical reasons why you cannot measure
the response time and resources (CPU, DL/I calls) for transactions
with only IBM's standard IMS log records.
However, you CAN use the TYPEIMS7 MXG program to get accurate counts
of transactions and resources by transaction, because it uses IMS 07
and IMS 08 log records, written for each deschedule of an IMS
program, which contains the count of IMS transactions that were run
during that program schedule (can be 1, usually is at least 5
transactions per schedule, and be millions for WFIs), the
transaction name, and the total CPU time and DL/I calls for all
of those IMS transactions. But you cannot get accurate resources
per transaction from the IMS 07/08 records. At best, you can get
the "average" of each group of transaction processed if you are
willing to divide the CPU time by the number of transactions run,
and you'll get fractional numbers of DL/I calls per transaction!
MXG Member TYPEIMFL will read the IMS log and will select and create
all possible datasets from any combination of Boole's IMF log
records (LCODE=FAx) IBM IMS log records (01,03,07,08,31x,36x,40x,
plus fastpath 59x subtypes 01,03,36x,37x,38x) and SAP IMS log
records (LCODE=AEx). Members TYPEIMFL and TYPEIMS7 both use macros
that are defined in VMACIMS to decode those IMS log records, and
which are fully supported by MXG.
It is not the reading of the IMF, IBM, and SAP IMS log records that
is the problem, but rather it is the construction of the
many-records-per-transaction-without-a-merge-key into a single
transaction record with per-transaction resources and response that
is in principle impossible with IMS log records.
Nevertheless if you still must try to get IMS response time with
only IBM's IMS log records, because your management still won't buy
you an IMS monitor tool, then, at your own risk, you can probably
get good results with the MXG assembly program ASMIMSL5 (IMS 5) or
ASMIMSLG (IMS 4) and their JCLIMSLG example. The ASM program acts
like an IMS MPR and reads the log to figure out which records go
with which transaction, and writes a copy of the IMS log records
with an appendage to identify the transaction, and then the MXG SAS
programs invoked in JCLIMSLG read the extended IMS log records to
crate dataset IMSTRAN with observations on a per-transaction basis.
These transaction records will always contain only average CPU and
DL/I calls, but the response time for each transaction is usually
quite accurate, although a few transactions may not be perfectly
matched and can have very large response times (and sometimes the
output queue time is accurately very large!). It is not guaranteed
that ASMIMSL6 will exist, but it is my hope to continue to provide
this crutch for IMS sites unwilling to purchase an IMS monitor.
VI. SAS Technical Notes.
1. There are no MXG problems using the Version 6.09 of the SAS System.
In fact, there have been no MXG problems with Version 6.08 at TS430
or later maintenance levels! Perhaps that is because MXG Software
is now a standard part of the SAS Quality Assurance test stream?
VII. CICS Technical Notes.
1. How can you use USER instead of TERMINAL to bill CICS transactions.
IBM note RTA000013242 Library item Q451666 answers the question,
"How can you use USER instead of TERMINAL to bill CICS transactions
in an ISC or MRO CICS environment (i.e., when using transaction
routing?", by pointing out that when you specify USERSEC=IDENTIFY
or ATTACHSEC(IDENTIFY) on the SYSTEM entry or CONNECTION definition,
the USER field is then propagated into the records created in the
AOR and other regions observations in CICSTRAN.CICSTRAN.
If you are billing CICS and DB2 by transactions, you really should
look at the ASUMUOW member that summarizes CICSTRAN and DB2ACCT and
their CPU times into one record per Unit of Work, reducing the
number of "things" you have to count. ASUMUOW keeps both TERMINAL
and USER as well as both CICS and DB2 CPU times plus CICS response
buckets in its output dataset PDB.ASUMUOW. If you were using
ASUMCICS to create PDB.CICS summary data, you will find ASUMUOW
preserves the CICS resource and response fields from PDB.CICS and
adds in the DB2 information. ASUMUOW replaces the earlier ANALDB2C
report program that merged DB2ACCT and CICSTRAN records.
VIII. Windows NT Technical Notes.
Dostları ilə paylaş: |