|
|
Model-Specific CPU Module Interface
===================================
Interface:
Introduction
------------
The model-specific cpu module interface layers on top of the
cpu module interface specified in and module
implementations thereof as per .
The canonical cpu module implementation is cpu.generic. At this
time the plan is that cpu.generic will be the *only* cpu module
implementation, and all model-specific support should be delivered
via the model-specific API as utilised in cpu.generic. Future
needs may change that design.
Success/Failure Semantics
-------------------------
Where an indication of success or failure is required or makes sense,
API members will return a cms_errno_t. Anything other than CMS_SUCCESS
is a failure.
cms_init
--------
extern void cms_init(void);
This is called from cmi_init following successful cpu module
initialization for a cpu. cms_init looks in directories
/platform/i86pc/kernel/cpu and /platform/i86pc/kernel/cpu/amd64
for a module with pathname matching (in preference order):
cpu_ms....
cpu_ms...
cpu_ms..
cpu_ms.
It is not an error for no matching module to be found - the
cpu module for this cpu will simply have to "go it alone"
without additional model-specific support.
If a match is found the module is loaded an its cms_init entry point
from its cms_ops_t vector is called.
cms_present
-----------
extern boolean_t cms_present(void);
This returns B_TRUE if the current cpu has successfully initialized a
model-specific cpu module. The caller is responsible for assuring that
the "current" cpu cannot change for as long as it requires the
answer to be valid.
cms_post_startup
----------------
extern void cms_post_startup(void);
A cpu module can call this from its cmi_post_startup entry point
to perform additional "global" configuration once boot cpu startup is complete.
cms_post_mpstartup
------------------
extern void cms_post_mpstartup(void);
A cpu module can call this from its cmi_post_mpstartup entry point
tp perform additional "global" configuration once MP startup is complete.
cms_logout_size
---------------
extern size_t cms_logout_size(cpu_t *);
This calls into the model-specific cpu module implementation to
determine the amount of additional space model-specific support requires
in the error logout structure to record model-specific error
telemetry. The cpu module should make a buffer of this size
available for model-specific logout when it calls cms_bank_logout.
XXFM Should this pass nbanks?
cms_mcgctl_val
--------------
extern uint64_t cms_mcgctl_val(int, uint64_t);
This calls into the model-specific module implementation to obtain
a value that should be used to initialize MCG_CTL. The first argument
is the number of MCA banks on the current cpu (from MCG_CAP)
and the second is the proposed/default MCG_CTL value.
If the model-specific module does not implement a cms_mcgctl_val
entry point the default is returned; otherwise that entry point
is called and is expected to return a suitbable MCG_CTL value or
the default value which is passed along to it.
cms_bankctl_skipinit
--------------------
extern boolean_t cms_bankctl_skipinit(int);
If the model-specific module is present and implements a cms_bankctl_skipinit
entry point then this is called to determine whether initialization
of the specified MCA bank number for this cpu should be skipped;
otherwise B_FALSE is returned.
cms_bankctl_val
---------------
extern uint64_t cms_bankctl_val(int, uint64_t);
If the model-specific module is present and implements a cms_bankctl_val
entry point then this is called to determine the value with which
to initialize MCi_CTL for the given bank number (first argument).
The second argument provides a proposed/default value which is
returned if no model-specific entry point is implemented and is also
passed into any model-specific entry point.
cms_bankstatus_skipinit
-----------------------
extern boolean_t cms_bankstatus_skipinit(int);
If the model-specific module is present and implements a cms_bankstatus_skipinit
entry point then this is called to determine whether to skip initialization
of MCi_STATUS for this MCA bank.
cms_bankstatus_val
------------------
extern uint64_t cms_bankstatus_val(int, uint64_t);
If the model-specific module is present and implements a cms_bankstatus_val
entry point then this is called to determine the value with which to
initialize MCi_STATUS. The second argument provides the proposed/default
value.
cms_mca_init
------------
extern void cms_mca_init(int);
This may be called from the cmi_mca_init entry point of a cpu module
implementation. The argument provides the number of MCA banks on
this cpu. If the model-specific support is present and implements
a cms_mca_init entry point it is called to allow it to perform
any additional (non-architectural) MCA initialization.
cms_poll_ownermask
------------------
extern uint64_t cms_poll_ownermask(hrtime_t);
When there are multiple cores on a chip it can be the case that
some share MCA bank facilities - e.g, all cores see the same
NorthBridge MCA registers on AMD family 15 (unless a bit is set
to force all but core 0 to read all-zeroes). In such cases
only a single core should poll that shared resource.
The cms_poll_ownermask function can be called to retrieve a bitmask
of MCA banks that should be polled by the caller. The caller
is responsible for ensuring that the "current" cpu cannot change.
If bit N is clear in the returned value then MCA bank N should
*not* be polled by the caller.
The argument is the polling interval in use. If the model-specific
support is absent or does not implement a cms_poll_ownermask
entry point then -1ULL is returned - all bits set so the caller
can poll all banks.
If model-specific support is present it is up to it how it determines
who should poll which banks. The poll interval argument is present
so that the implemntation can decide that a "current owner" has gone
quiet and that a new owner may be granted (it is generally desirable that
the "current owner" not flip-flop between the cores of a chip).
cms_bank_logout
---------------
extern void cms_bank_logout(int, uint64_t, uint64_t, uint64_t, void *);
This may be called from cpu module logout code (grabbing error telemetry
at a poll or machine check event) to allow model-specific support
to logout additional model-specific telemety.
The first argument is the MCA bank number, the next three are the
MCi_STATUS, MCi_ADDR and MCi_MISC MSR register values for the bank
(already read in the cpu module, and presumably with the valid bit set
in the status value), and the last is a pointer to a buffer sized
as per cms_logout_size which the model-specific support may use
in any way it likes. This same buffer will be quoted in other
calls into the model-specific support.
While this function may be called for each MCA bank of a cpu,
the buffer that is passed into it should be the same on every
call.
cms_msrinject
-------------
extern cms_errno_t cms_msrinject(uint_t, uint64_t);
Calls into model-specific code cms_msrinject entry point to write
the specified MSR to the specified value; otherwise returns
CMSERR_NOTSUP. This is intended for use from the cmi_mca_msrinject
entry point of a cpu module.
cms_error_action
----------------
extern uint32_t cms_error_action(int, int, uint64_t, uint64_t, uint64_t,
void *);
Calls the cms_error_action entry point of the model-specific support,
if present; otherwise returns 0.
This is intended to be called from cpu module error handling code.
It permits model-specific support to perform additional handling of the
error (say a cache flush) and to indicate the higher-level scope
and impact of the error.
If the returned value includes CMS_ERRSCOPE_UNCORRECTED then there
is uncorrected data still present in the system (even after possible
additional handling in the model-specific module). If the flag is
absent then from the point of view of the model-specific support
there is no uncorrected data in the system, but the cpu module should
still check and honour PCC etc.
If in addition to CMS_ERRSCOPE_UNCORRECTED the returned value also
indicates CMS_ERRSCOPE_POISONED then the bad data remains in the
system but has been poisoned or otherwise signalled such that
it cannot be used and mistaken for good data.
If in addition to CMS_ERRSCOPE_UNCORRECTED the returned value also
indicates CMS_ERRSCOPE_CURCONTEXT_OK then the current context is
unaffected by the error.
The intention of these return flags is to answer these questions:
- is there uncorrected data present in the system as a result of this error?
- if so, has the potential impact of that bad data been constrained by
signalling the data as bad (e.g., with data poisoning)?
- is the current context affected by the uncorrected data?
so that we can decide whether panic, contract kill etc is the appropriate
response.
cms_disp_match
--------------
extern void *cms_disp_match(cpu_t *, int, uint64_t, uint64_t, uint64_t, void *);
Calls the cms_disp_match entry point of the model-specific support, if
present; otherwise returns NULL.
A non-NULL return is a cookie meaningful only to the model-specific
implementation, which should be quoted in subsequent calls to
cms_ereport_class, cms_ereport_detector, cms_ereport_includestack,
cms_ereport_add_logout for this error.
We are not necessarilly running on the cpu that detected or experienced
the error - cms_disp_match is intended to be call in post-handling
logging code which may be running on another cpu. The first argument
indicates the cpu which detected/experienced the error.
The model-specific support may use the indicated bank number and the
MCi_STATUS, MCi_ADDR, MCiMISC values already read for this bank
along with the any information from the model-specific logout area
pointed to by the last argument to classify the error and associate
some cookie that represents this classification (e.g., a pointer
to a data structure that holds all info about this error type).
This API member is expected to be called from errorq drain processing
code.
cms_ereport_class
-----------------
extern int cms_ereport_class(cpu_t *, void *, const char **, const char **);
Calls the cms_ereport_class entry point of the model-specific support,
of present; otherwise returns 0. The model-specific support should
determine the ereport class name to be used in logging this error.
The arguments are the cpu which detected or experienced the error,
the cookie returned by a previous call to cms_disp_match for this
error data, and two pointers to const char * which the model-specific
code can point to strings for the cpu class and leaf class
to be used in the ereport class for this error. These two
pointers are NULLed before calling model-specific code, and
the call is considered successful (meaning cms_ereport_class will
return nonzero) if both these pointers are no longer NULL after
the call to model-specific code.
The "cpu class" string is the subclass to use before the
final error leaf class. For example, if an ereport should
have class .amd64.dcache_parity then the cpu class
is "amd64" and the leaf class "dcache_parity"; the consumer code
is responsible for constructing the final ereport class string
by prepending something like FM_ERROR_CPU to these strings.
cms_ereport_detector
--------------------
extern nvlist_t *cms_ereport_detector(cpu_t *, void *, nv_alloc_t *);
Calls the cms_ereport_detector entry point in model-specific support,
if it exists; otherwise returns NULL. The model-specific support
should construct an FMRI for the error detector using the
cpu info and cookie (from cms_disp_match) provided, and should use
the provided nv_alloc_t allocator in manipulating nvlists.
If NULL is returned then the consumer is responsible for constructing
some form of detector FMRI.
cms_ereport_includestack
------------------------
extern boolean_t cms_ereport_includestack(cpu_t *, void *);
Calls the cms_ereport_includestack entry point in model-specific support,
if present; otherwise returns B_FALSE. Model-specific code should
indicate whether a call stack was captured for this error and, if so,
whether than should be included in ereport payload.
cms_ereport_add_logout
----------------------
extern void cms_ereport_add_logout(cpu_t *, nvlist_t *, nv_alloc_t *, int,
uint64_t, uint64_t, uint64_t, void *, void *);
Calls model-specific code to allow it to add model-specific payload
information to the ereport already partially-constructed in the
nvlist_t passed as second argument. Further nvlist manipulation should
use the provided nv_alloc_t. The remaining arguments are the bank number,
MCi_{STATUS,ADDR,MISC} of the error, a pointer to the model-specific
logout area, and the cookie previously returned by cms_disp_match.
|