* copyright (C) 1984-2019 merrill consultants dallas texas usa

Yüklə 28,67 Mb.

səhifə	272/383
tarix	17.01.2019
ölçüsü	28,67 Mb.
	#98988

1 ... 268 269 270 271 272 273 274 275 ... 383

is the total platform capacity known, but also the utilization of

individual LPARs is measured in ASUM70PR.

The problem arises when CPUs are Dedicated to an LPAR, or when Wait

Complete = Yes is used, because the dispatch time in those cases is

NOT equal to the CPU executing time. While a dispatch time of one

hour does mean that one hour of total platform capacity was used by

an LPAR, (i.e., not available to other LPARs), the actual CPU time

used by that LPAR may be a lot less than one hour. What we need is

the Wait time measured inside each MVS system, which is in the MVS

TYPE70 dataset, but each type 70 record only has a single TYPE70

segment (for the LPAR in which this MVS System executed); we do not

get a TYPE70 segment for the other LPARs. But MXG does store the

MVS Wait Time from the TYPE70 segment into variable ORIGWAIT in the

TYPE70PR observation for each LCPUADDR, which shows this data:

Wait Complete = YES example: System SYSC (LPARCPUS=2 PARTNCPU=4)
LPARNUM=PARTISHN=2

LCPU=0 LCPU=1

DURATM=15 min DURATM=15 min

|---------------------------------|-------------------------------|

8 min 7 min 15min

|--------------------|------------|-------------------------------|

Dispatched LPAR Wait Dispatched

LCPUPDTM 70PR calc LCPUPDTM 70PR

5 min 3 min 7 min 11 min 4 min

|----------|=========|------------|---------------------|=========|

ORIGWAIT BUSY LPAR Wait ORIGWAIT BUSY

70 calc calc 70 calc

This LPAR has two LCPUs, Wait Complete=Yes, but due to the other

LPAR on this platform (that was also using Wait Complete=Yes), the

LCPU=0 was dispatched for only 8 minutes of the 15 minute interval,

while LCPU=1 was dispatched for all 15 minutes. The ORIGWAIT from

TYPE70 shows that LCPU=0 was actually CPU Busy for only 3 minutes,

and LCPU=1 was actually CPU Busy for only 4 minutes.

While there are only two LCPUs for this LPAR, this LPAR is in a

platform that has four engines, so the ASUM70PR calculation is:

PCTL2BY = (8 disp + 15 disp )/ (4*15) = 23/60 = 38%

because 38% of the dispatch capacity of the four engines in the

hardware platform was consumed by this LPAR in this interval.
However, RMF in its CPU Activity Report calculates two percentages

(and MXG replicates in both TYPE70 and TYPE70PR data):

PCTCPUBY = "LPAR Busy Time" = (3 busy + 4 busy) / (2 * 15) = 23%
PCTMVSBY = "MVS Busy Time" = (10 busy+lparwait + 4 busy)/30 = 48%
The "LPAR Busy Time" shows that this LPAR was busy for 7 of the 30

minutes that the two engines in the LPAR could have been executing,

and thus is a measure of how busy the MVS system might have been.
However, the "MVS Busy Time" calculated by IBM is at best confusing

and at worst wrong, for Wait Completion = Yes LPARs, because it

calculates the MVS busy time as DURATM minus ORIGWAIT, adding the 3

minutes busy and 7 minutes of LPAR wait from LCPU=0 to the 4 minutes

busy from LCPU=1 to conclude 14 minutes of "busy time" out of the

30 minutes that the two engines could have been executing, for 48%!

But the MVS SRM never saw those possible 30 minutes of execution; it

was dispatched for only 8 + 15 = 23 minutes, so a far more accurate

measure is "SRM Busy Time", the busy time over the dispatched time:
PCTSRMBY = "SRM Busy Time" = (3 busy + 4 busy) / 23 (dispatch) = 30%
which more accurately reflects what MVS can do with Wait Comp=Yes,

and it strongly suggests that the IBM "MVS Busy Time" is wrong for

Wait Comp=Yes. Note: Jan 2006: Using WAITCOMP=YES is no longer an

issue; only the early AMDAHL implemented that option, as I recall.

(The example used the Partition Dispatch times, but to be

slightly more precise, using the Effective Dispatch times would

show what was delivered to MVS. I am still deciding if I should

create a new variable for PCTSRMBY, but want to send this

preliminary note to MXG-L, so I will update this part of this

note at a later date.)

Dedicated example: System SYSA (LPARCPUS=3 PARTNCPU=4)

LCPU=0 Dedicated, Wait=No

LCPU=1,2 Shared, Wait=No
LPARNUM=PARTISHN=5

LCPU=0

DURATM=15 min DURATM=15 min

|---------------------| |------------------------------------|

LCPU=1

14:59.20 5:48.92 8:25.73 0:45.35

|---------------------| |===============|---------|----------|

Dispatched Dispatched ORIGWAIT Non-Disp

LCPUPDTM 70PR LCPUPDTM 70PR 70 Non-Wait

BUSY calc

LCPU=2

3:11.51 11:48.49 5:49.20 8:25.41 0:45.39

|----------|==========| |===============|---------|----------|

ORIGWAIT BUSY Dispatched ORIGWAIT Non-Disp

70 calc LCPUPDTM 70PR 70 Non-Wait

BUSY calc

For all the three LCPUs in this LPAR, MXG calculates in ASUM70PR:

PCTL5BY = 100* ( 26.5 / 4*15) = 100 * 26.5 /60 = 44.37%

because the total dispatch time of the three LCPUs was 26.5 minutes

of the possible 60 minutes of dispatch time in the four engines of

the platform, and this is this LPAR's use of dispatch capacity.
But if we have the TYPE70PR observation from the system that has the

ORIGWAIT measurement from TYPE70 for that dedicated LCPU, we can see

the LPAR's total CPU busy time was only 11:48 + 5:48 + 5:49, or 22.5

minutes, since 3 minutes of that dispatch time was in MVS wait time!

The IBM RMF calculations for each LCPU and the total for all three

LCPUs in this LPAR show:

LCPU PCTCPUBY (calc) PCTMVSBY (calc) Status

0 78.72 (11:48/15) 78.72 (11:48/15) Ded,Wait=No

1 38.77 ( 5:48/15) 43.81 ( 6:33/15) Shr,Wait=No

2 38.80 ( 5:49/15) 43.84 ( 6:34/15) Shr,Wait=No

all 52.10 (23:17/45) 55.46 (24:55/45) Combined
For the Dedicated LCPU, both PCTCPUBY and PCTMVSBY are calculated

PCTCPUBY=PCTMVSBY= 100*(DURATM-ORIGWAIT)/DURATM = 78.7%

PCTMVSBY=PCTCPUBY= 100*(DURATM-ORIGWAIT)/DURATM = 78.7%
For the Shared, Non-Wait LCPUs, the "Lpar Busy Time" is

PCTCPUBY= 100*LCPUPDTM/DURATM = 38.7%

but the IBM calculation for the "MVS Busy Time" is

PCTMVSBY= 100*(DURATM-ORIGWAIT)/DURATM = 43.8%

because the PCTMVSBY value includes the 45 seconds of non-dispatched

non-wait time recorded in the MVS Busy Time calculation!

Again, while PCTCPUBY is legitimate, PCTMVSBY raised more questions

than it answers, initially. Note: Jan 2006: However, now it is

used to calculate the SHORTCPS variable, and is thus useful.
To summarize what percentages are printed where by IBM and reported

where by MXG, on RMF CPU Activity Report, the "LPAR Busy Time Perc"

is variable PCTCPUBY, and the "MVS Busy Time Perc" is variable

PCTMVSBY in dataset TYPE70 (and now in TYPE70PR as well). On RMF's

Partition Data Report, IBM's "Logical Processors Total" is variable

LPCTnBY, and IBM's "Physical Processors Total" is PCTLnBY in dataset

ASUM70PR for each LPAR, and the "Physical Processors Total" is the

variable PCTCPBUY in ASUM70PR.

Note: I intend to revise this note as I learn more, especially for

millennium and/or MDF, in the near future. The purpose of this

much of the note was to document what is calculated by MXG and by

IBM when you try to compare RMF reports to MXG datasets, and to

point out basic problems if you have Dedicated or Wait Comp = YES.

Not only is there a problem in ASUM70PR in that we do not know the

true CPU busy time, we also have assumed the "capacity" was the

DURATM of the interval, but that is not always the case, especially

when LPAR weighting is taken into account. No single percentage

value can be used, as it depends on your perspective. ASUM70PR

reports usage percentages of the "dispatch" capacity, while TYPE70

still must be used to understand what is happening inside each MVS.

2. FAT32 file system reduces space needed for MXG from 139MB to 68MB.
On Windows 95 and Windows NT with FAT File Systems, the MXG Source

Library directory DIR command shows 3549 files totaling 57.7 MB,

but the files in that directory actually required 139.1 Megabytes

of disk space! The 2GB disk drive with 32K cluster size wastes

space if the file is less than 32KBytes, and as only 272 of MXG's

source files are over 32K in size, the other 3277 small files waste

lots of disk space with large cluster size under FAT file systems.
Well that is a dead problem with the newer FAT32 file system that

virtually eliminates the space waste problem. That same source

library required only 68.23 MegaBytes on a 9GB FAT32 disk drive!

III. MVS Technical Notes.

1. APAR OW25609 corrects a stoppage of SMF type 30 interval records

(subtypes 2 & 3) and type 23 records, after a serialization problem.

The APAR applies to MVS/4.3 thru OS/390 2.4.
2. APAR OW28289 changes counts in type 30 variables TAPNMNTS/TAPSMNTS

(SMF30PTM/SMF30TPR). In DF/SMS 1.2 and earlier, tape mount counts

were the number of physical mounts (actually, a count of volumes

that were verified by OPEN/CLOSE/EOV via a loadpoint read of the

VOL1 tape label). That was changed by an SPE to DF/SMS 1.2.0 (which

was included in DF/SMS 1.3.0 and 1.4.0); IBM decided instead to

count logical volumes (i.e., increment the mount count when OPEN

processing is entered with the tape drive in a ready state and with

the mounted volume at loadpoint). A document change was prepared

but never distributed, and now IBM is backing out the SPE's effect,

and with this APAR, the counts revert to physical mount counts. The

APAR's text is confusing, because it lists PTFs for DF/SMS releases

1B0, 1C0, and 1D0, which turn out to be DFSMS 1.2, 1.3, and 1.4,

respectively. If you depend on the count of tape mounts in type 30

records, you will want to apply this PTF.
3. APAR OW28613 corrects errors in the JES2 Type 26 Purge record in the

SMF26OAG Accounting Section offset. I earlier thought MXG would not

fail, but without that APAR, MXG offset validation was insufficient,

an INPUT STATEMENT EXCEEDED occurs. Now, Change 15.330 circumvents

the wrong value for SMF26OAG, but the ACCOUNTn fields in TYPE26J2

will be blank until you install to APAR to correct IBM's error.

Fortunately, MXG only uses the TYPE26J2 ACCOUNTn fields for jobs

that do not produce type 30s (JCL Errors or Cancel before start).

4. APAR OW28256 reports invalid CPU times measured (once again!) in RMF

type 72 field SMF72RCT (MXG Variable CPURCTTM, which is summed into

variable CPUTM); PTF was available November 14 1997. This causes

the total CPU time captured in type 72 records to exceed the total

CPU busy time, causing the Uncaptured CPU time (misnamed as CPUOVHTM

and labeled as "Overhead") to be negative in RMFINTRV. This same

field was in error in 1992, fixed then by APAR OY51878. MXG now

detects the negative value and prints this error message on the log:

"ERROR. NEGATIVE CPU-UNCAPTURED-TIME (TYPE70-TYPE72)".

See text of Change 15.238 for more details.

5. APAR OW26619 for OS/390 V2.4, in Goal Mode corrects WLM errors found

by IBM during final function test, and corrects SMF values.

6. APAR OW26421 for OS/390 V1.3 is needed only for ASMTAPES. In OS/390

IBM created two 4-byte fields for Y2K support to replace the 3-byte

fields JCTSSD and JCTJMRJD (step and job start/init dates), but I

missed that change, so ASMTAPES still used the 3-byte fields. But

IBM also zeroed the 3-byte fields, which caused INVALID DATA when

TYPETMNT was executed, and variable INITTIME has missing value.

This APAR restores the dates in the 3-byte fields, so INITTIME will

not be missing. The ML-15 of the MXG ASMTAPES avoids the exposure

by using the 4-byte fields if they are present.
7. SYNCSORT 3.6 can ABEND 0C9 during a PROC SORT; SYNCSORT fix SY49930

is the correction.

8. APAR OW30153 corrects type 30 Measured Usage (MULC) segments. There

are multiple occurrences of the same product name and qualifier for

PRODNAME=CICS PRODQUAL=DFHKETCB in the interval records that should

have had only a single segment. There are still other errors that

are not addressed in creating the subtype 4 and subtype 5 records

from the interval records. One CICS job had 39 DHFKETCB segments in

its interval records (subtype 2 and 3), but had 37 segments in its

step termination record (subtype 4) and then had only 36 segments in

its job termination record (subtype 5). Further, the job had 12

DFHSIP segments in the interval records but had 16 segments in both

step and job terminate. Finally, the job had 2 DFHDUP segments in

the job term but none in either the interval or step term records.

A new problem has been opened with IBM on this error.

Note that old APAR OW16176, which consolidates MULC sections for

each product, should be installed. Increasing SMF buffers with

APAR OW12836 is also recommended to minimize the problems with SMF

buffers, and especially specification of DDCONS=NO in SMFPRMxx in

SYS1.PARMLIB is strongly recommended to eliminate the SMF address

algorithm to consolidate DD segments.

Note added Dec 30, 1997:

APAR PN80497 corrects a problem after applying UN84065 with Measured

Usage (MULC) that can create millions of type 30 subtype 3 records

with the same product name in the MULC segment. The problem

occurred with an IMS BMP that used MQ Series. The excess records

could cause IEE979W SMF DATA LOST - NO BUFFER SPACE AVAILABLE.
9. APAR OW30059 (PTF available 12Dec97) reports type 42 values for

Direct Write and Direct Read SMF42DWB/SMF42DRB and this APAR is

likely the fix that was originally described in note 26 in MVS

Technical Notes in MXG Newsletter THIRTY-TWO for APAR OW20926.

When the channel program did single CI reads and writes, residual

data was left in the counter that was not used.

10. APAR PQ09396 (Target 26Dec97) for MQSERIES SMF type 116 reports

inconsistencies between 115 and 116 record's statistics.

11. APAR PQ09083 is for subtype '51'x of the FTP SMF record (VMACFTP).

The text mentions SMF Record Type 51, but there is no type 51 SMF

record (yet). The APAR corrects missing values in variables

DVGSETME/DVGSEDTE in dataset FTP51X.

12. Job Accounting for Started Tasks became available with MVS/ESA 5.1,

because you can now have a JOB card in the JCL for your STC's, and

can put ACCOUNT parameters in that JOB card that show up in MXG's

ACCOUNTn variables in PDB.JOBS/PDB.PRINT/PDB.STEPS datasets. The

JCL Reference Manual Sections 7.2, 7.3, and 16.7 discuss how.
13. What happens to measurements if I have a Y2K Test System in an LPAR?
You can use the ASUM70PR dataset and select the observations from

your production LPAR (SYSTM='PROD') to measure the Y2K Partition's

resources, since the STARTIME of the records with SYSTEM='PROD'

will be your local time of day.

All of the records written on SYSTEM='Y2K' will have the year 2000

dates (although the READTIME value could be earlier if jobs were

read into the hold queue before IPLing with year 2000). Since the

Y2K system will be re-IPLed repetitively with the same start value

(probably 31DEC99:23:45:00), RMF interval data will appear to have

duplicate data and the jobs/steps from all IPLs will be jumbled

together, because MXG sorts RMF data by STARTIME and job data by

READTIME.

You can extract SYSTEM='Y2K' data for a specific "test run" by

finding the record number (_N_) of each SMF IPL record, using:

%INCLUDE SOURCLIB(VMACSMF);

DATA _NULL_;

_SMF;

IF ID=0 THEN PUT 'IPL RECORD FOUND ' _N_= SMFTIME=;

and then use the record number of the specific IPL to select only

the SMF records desired. If you wanted the third run, and the third

IPL record had _N_=8,000 and the next IPL record had _N_=10,000, you

would use this logic:

%INCLUDE SOURCLIB(VMACSMF);

DATA _NULL_;

_SMF;

FILE SMFOUT DCB=SMF;

IF 8000 LE _N_ LE 9999 THEN PUT _INFILE_;

IF _N_ EQ 10000 THEN STOP;

to write to //SMFOUT DD only those records for that test run.

There is an alternative. You can use the IPL PROMPT feature to

require the operator to reply with the (local) time and the reason

(describe the test run) for each IPL, and there will be a SUBTYPE=8

observation in dataset TYPE90 with variables DTIME and IPLREASN with

the operator's reply, so the TYPE90 dataset can be used to identify

the records in each test run (variable SMFRECNR, equal to _N_, was

added to the TYPE90 dataset by Change 15.267).

You must have specified PROMPT(IPLR) or PROMPT(ALL) in member

SMFPRMxx in SYS1.PARMLIB dataset to prompt the operator for the

reply at each IPL.
14. Almost-Duplicate TYPE74 records, differing only by one second in the

STARTIME, can be written by Boole & Babbage's CMF Product, if both

IPM and CPM modes are enabled. This has happened recently as sites

installed OS/390. In MXG's TYPE7xxx datasets, variable PRODUCT will

be 'CMF-IPM' in one almost-duplicate record, and 'CMF-CPM' in the

other observation. Boole does NOT recommend both modes!

15. Channel Type variable CHANTYPE in dataset TYPE73 still exists, but

variable SMF73CPD provides a better description as it describes both

ESCON and Parallel Channel types. SMF73CPD was new in MVS/ESA 5.1.
16. APAR OW27855 corrects PSF/MVS-written type 6 SMF records so that

they now contain the node number of the current node in field

SMF6ROUN, which MXG decodes into variable NODE and RMOTID in TYPE6

dataset.
17. APAR OW20844 enables JES2 job numbers greater than 32000, but has

no impact on MXG, since MXG has supported 5-digit JES Numbers thru

99999 from the JCTJOBID for several years.

IV. DB2 Technical Notes.
1. There are no DB2 Technical Notes in this newsletter.
V. IMS Technical Notes.
1. Support for Boole's IMF 3.2 (for IMS 6.1) was added in MXG 15.09.

Candle has not informed me of any changes in their ITRF product.

2. Discussion of IMS Log support in MXG Software.
I strongly recommend you use an IMS monitor (Boole or Candle)

that creates a transaction record, rather than attempt to use

IBM's IMS log for transaction response and resource measurement.
See MXG newsletter TWENTY-FIVE, IMS Technical Notes, for the MXG

position statement of the technical reasons why you cannot measure

the response time and resources (CPU, DL/I calls) for transactions

with only IBM's standard IMS log records.

However, you CAN use the TYPEIMS7 MXG program to get accurate counts

of transactions and resources by transaction, because it uses IMS 07

and IMS 08 log records, written for each deschedule of an IMS

program, which contains the count of IMS transactions that were run

during that program schedule (can be 1, usually is at least 5

transactions per schedule, and be millions for WFIs), the

transaction name, and the total CPU time and DL/I calls for all

of those IMS transactions. But you cannot get accurate resources

per transaction from the IMS 07/08 records. At best, you can get

the "average" of each group of transaction processed if you are

willing to divide the CPU time by the number of transactions run,

and you'll get fractional numbers of DL/I calls per transaction!

MXG Member TYPEIMFL will read the IMS log and will select and create

all possible datasets from any combination of Boole's IMF log

records (LCODE=FAx) IBM IMS log records (01,03,07,08,31x,36x,40x,

plus fastpath 59x subtypes 01,03,36x,37x,38x) and SAP IMS log

records (LCODE=AEx). Members TYPEIMFL and TYPEIMS7 both use macros

that are defined in VMACIMS to decode those IMS log records, and

which are fully supported by MXG.
It is not the reading of the IMF, IBM, and SAP IMS log records that

is the problem, but rather it is the construction of the

many-records-per-transaction-without-a-merge-key into a single

transaction record with per-transaction resources and response that

is in principle impossible with IMS log records.
Nevertheless if you still must try to get IMS response time with

only IBM's IMS log records, because your management still won't buy

you an IMS monitor tool, then, at your own risk, you can probably

get good results with the MXG assembly program ASMIMSL5 (IMS 5) or

ASMIMSLG (IMS 4) and their JCLIMSLG example. The ASM program acts

like an IMS MPR and reads the log to figure out which records go

with which transaction, and writes a copy of the IMS log records

with an appendage to identify the transaction, and then the MXG SAS

programs invoked in JCLIMSLG read the extended IMS log records to

crate dataset IMSTRAN with observations on a per-transaction basis.

These transaction records will always contain only average CPU and

DL/I calls, but the response time for each transaction is usually

quite accurate, although a few transactions may not be perfectly

matched and can have very large response times (and sometimes the

output queue time is accurately very large!). It is not guaranteed

that ASMIMSL6 will exist, but it is my hope to continue to provide

this crutch for IMS sites unwilling to purchase an IMS monitor.

VI. SAS Technical Notes.

1. There are no MXG problems using the Version 6.09 of the SAS System.

In fact, there have been no MXG problems with Version 6.08 at TS430

or later maintenance levels! Perhaps that is because MXG Software

is now a standard part of the SAS Quality Assurance test stream?

VII. CICS Technical Notes.
1. How can you use USER instead of TERMINAL to bill CICS transactions.
IBM note RTA000013242 Library item Q451666 answers the question,

"How can you use USER instead of TERMINAL to bill CICS transactions

in an ISC or MRO CICS environment (i.e., when using transaction

routing?", by pointing out that when you specify USERSEC=IDENTIFY

or ATTACHSEC(IDENTIFY) on the SYSTEM entry or CONNECTION definition,

the USER field is then propagated into the records created in the

AOR and other regions observations in CICSTRAN.CICSTRAN.
If you are billing CICS and DB2 by transactions, you really should

look at the ASUMUOW member that summarizes CICSTRAN and DB2ACCT and

their CPU times into one record per Unit of Work, reducing the

number of "things" you have to count. ASUMUOW keeps both TERMINAL

and USER as well as both CICS and DB2 CPU times plus CICS response

buckets in its output dataset PDB.ASUMUOW. If you were using

ASUMCICS to create PDB.CICS summary data, you will find ASUMUOW

preserves the CICS resource and response fields from PDB.CICS and

adds in the DB2 information. ASUMUOW replaces the earlier ANALDB2C

report program that merged DB2ACCT and CICSTRAN records.

VIII. Windows NT Technical Notes.

Yüklə 28,67 Mb.

Dostları ilə paylaş:

1 ... 268 269 270 271 272 273 274 275 ... 383