Network Auto-Magic Architecture
Version 2.0, 2007-Feb-16
There are six focus areas described below:
- Overview & Component Interaction
- State Machine
- Event Handler
- Profiles
- Network Service Model
- Dependencies with the rest of the System
There are also two appendices:
- Glossary
- Revision History
0. Architecture vs. Design
This Architecture document was completed in April 2006, then used as the
basis for the
Design Document which was completed
in February 2007. Most of the high-level plans laid out here can be found
in a similar lower-level form there, though as one might expect, some changes
were made as part of the design process:
- Link-Layer Profiles (LLPs) were renamed to Network Configuration Profiles
(NCPs), and their form changed somewhat. LLPs were per-link, whereas an
NCP describes the entire system, and is made up of NCUs (U == Units), where
each NCU describes a link or an IP interface.
- Upper-Layer Profiles (ULPs) were renamed to Environments, but were otherwise
little changed.
- The State Machine / Event Handler is largely similar at a high level, but
its functional details are rather different.
- Our Network Service Model is largely similar, but again, some details have
changed.
Keep these changes in mind when reading this document.
1. Overview & Component Interaction
1.1 Introduction
Network Profiles, the primary component of the Network Auto-Magic project,
are a way to simplify network configuration management. They work by allowing
users to specify various properties which determine how things work in
different circumstances. The properties include, but are not limited to:
- Link-Layer
- which network interface(s) to use
- how to obtain IP address(es) for the interface(s) in use
- whether or not a given link should be configured automatically
- parallel interfaces to the same subnet (i.e., link aggregation
and IP Multipathing)
- the relative priority vs. other Link-Layer profiles
- Upper-Layer
- conditions under which this profile should be activated
- which name service(s) to use
- a host name (and any required variations thereof)
- routing information
- a set of IP filter rules
smf(5) services
- user-specified post-activation "hook"
- the relative priority vs. other Upper-Layer profiles
Note that this dual-layer model was chosen to support "overlay" profiles,
as discussed in our
Story Boards and
Requirements. Examples are provided below
to illustrate this.
1.2 Overview
Let us begin with an architectural overview. The primary components are:
- The profile repository. This is where the configuration program
stores its data, which will also be read by the profile daemon.
- The profile configuration program (a.k.a. the UI).
- Note that there will be both CLI and GUI versions of this program
which will perform similar if not identical tasks.
- In addition to using the repository, it also interacts with the
profile daemon.
- Tasks which users will use this program to perform include:
- creating, modifying and deleting profiles
- activating one or more profiles
- querying information about profiles
- The profile daemon.
- This reads data from the repository.
- It reacts to events as notified by the event handler.
- It reacts to changes which users make via the configuration program.
- The "state machine" described in
Section 2 is implemented in this
daemon.
- The daemon also interacts with the SMF network services.
- The event handler. This will likely have at least some kernel component,
which will report information about events. A user-land component
will gather this information and report it to the profile daemon.
The user-land component may exist within the profile daemon itself.
- The SMF network services. These are already part of Solaris, but
we expect to modify them to some extent. The daemon will restart
/ refresh some of these services as needed.
1.3 Interactions
How they interact is roughly as follows:
- At any given time, one or more Link-Layer profiles and exactly one
Upper-Layer profile are "active".
- At boot, the profile daemon consults the repository for the current
active Link-Layer profile(s), proceeds until one or more IP addresses
have been configured, checks the conditions of the Upper-Layer profiles,
activates the highest priority one whose conditions match, and configures
the network(s) accordingly. It is not yet clear whether:
- the active profile(s) is/are always persistent across reboots
or
- there may be support for temporarily active profiles which do
not persist across reboots
- As events occur which may trigger a change in the network configuration,
the event handler detects these and notifies the daemon accordingly.
The daemon in turn consults the active profiles and may reconfigure the
network(s) accordingly. Note that some of these events may indicate
that the conditions have changed.
- A change in conditions may trigger activating a different Link-Layer
profile, which may in turn trigger a change in the Upper-Layer profile,
which may in turn affect the network configuration. A change in
conditions may also trigger activating a different Upper-Layer profile
directly, without changing the underlying Link-Layer profile(s).
- If a user modifies a profile, the configuration program updates the
repository and notifies the daemon. If the current active profile is
modified, then the daemon may reconfigure the network(s) accordingly.
- Likewise, if a user activates a new profile (at either layer), then the
configuration program updates the repository and notifies the daemon,
which may then reconfigure the network(s) accordingly. Note that a user
can always manually activate a profile (at either layer), regardless of
conditions. Also note that users who desire total control will be able
to specify conditions such that a different profile is never activated
automatically.
1.4 Examples
- There will likely be an out-of-the box pair of profiles for "no network"
which specify
files for everything in
/etc/nsswitch.conf,
disables services which make no sense in a stand-alone environment, etc.
Then at boot, the profile daemon would consult the conditions, note that
there was no networking, then automatically select that pair of profiles.
- A user at Sun might specify an Upper-Layer "SWAN" profile:
- conditions of the form "apply when a wired network with IP addresses
in the range
129.144.0.0/12 is detected"
- a property to use name server X
- a property to use
files/dns/nis or files/nis
in /etc/nsswitch.conf
- a property to use NIS server Y
- enable the SMF service
nis/client
- etc.
- A user might specify a Link-Layer profile which specifies "when I detect
a WLAN on interface
bcmndis0 with ESSID X and BSSID Y, then
use DHCP to get an IP address", then a related Upper-Layer profile whose
conditions activate it contingent upon the above, then have a user "hook"
to punchin, which in turn activates the punchin Link-Layer profile, which
creates an IPsec tunnel, which ultimately leads to the Upper-Layer "SWAN"
profile being activated.
2. State Machine
One of our focus areas is "State Machine", which needs to cover both the
abstract set of states for the profile daemon, and the set of possible
transitions between those states. For now, we will focus on the transitions,
with the idea that sufficiently specifying the transitions may suggest what
the states themselves should be.
2.1 Initial Conditions
Since the new network/profile service will be replacing the existing
network/physical and network/initial services (see
§5.2), all of the start-up functionality
of those two services will need to be accounted for. Though the state machine
is designed to make bringing up an interface at boot time as much as possible
like bringing up an interface on a running system, there will be a few
initialization tasks required:
- Plumbing of loopback interfaces.
- Walking the device tree to find existing network links. During
normal operation, we would receive notification of the arrival
of a new link; this notification kicks off the process of bringing
up that link. At boot time, those event notifications will not
occur, so we will have to find all existing links and "manually"
kick off the bring-up of those links.
2.2 Event-Driven Transitions
A transition is made up of a series of reactions to a given event. These
events come in pairs: some new thing is available, some old thing is no
longer available. There are three event pairs which frame the "life-cycle"
of a link:
- link is created:
link++
A new link has been added to the system. Common possible reasons:
- a NIC is hot-plugged/DR inserted
- a new tunnel has been configured
The system reacts by:
- plumbing the link (if necessary)
- if the link is wireless, attempting to connect: scan for APs, consult
list of "known" ESSIDs for matches (may also have to explicitly try
connecting to user-specified ESSIDs that do not advertise). If an
available ESSID matches one on our list, connect. If not (depending
on policy), present available ESSIDs to user for selection.
- link comes up (
RUNNING flag is set): network++
A link has become available for use. Common possible reasons:
- a LAN cable is plugged in
- a wireless link has connected to an AP
The system reacts by
- Gathering information about the network: DHCP server availability,
VLANs and/or subnets present
- Consulting the Link-Layer profile to determine if this link should be
configured. If no, note its availability, but leave link down and do
nothing else; if yes, bring link up.
- link gets IP address:
ip++
An IP address has been assigned to the link. Common possible reasons:
- DHCP lease has been obtained
- IPv6 stateless address autoconf has completed
- a link-local address has been configured
- a static address has been assigned
The system reacts by:
- Consulting Upper-Layer profiles to decide what higher-layer
configuration is required; this could include applying completely
new configuration (for name services, ipfilter, etc.), restarting
existing services, or possibly doing nothing.
- Activating new configuration as needed.
- link loses IP address:
ip--
An IP address has been removed from the link. Common possible reasons:
- DHCP lease has expired
- an autoconfigured address has timed out
- a link-local address has been removed
- a static address has been removed
The system reacts by:
- Consulting Upper-Layer profiles to decide what higher-layer
configuration changes, if any, are required.
- Activating new configuration as needed.
- link goes down:
network--
A link is no longer available for use. Common possible reasons:
- a LAN cable is unplugged
- a connection with an AP is lost
The system reacts by:
- If link was not in use (
UP flag is off), just deleting
its info from our list of available links.
- Consulting Link-Layer profile to determine if other
RUNNING
links should be brought up, and then following the rest of the
network++ process for those links as needed.
- link is removed:
link--
A link has been removed from the system. Common possible reasons:
- a NIC is unplugged/DR removed
- an existing tunnel has been torn down
The system reacts by:
- If link was up (
RUNNING), taking it down (this will
cause "link goes down" steps to take place).
- Unplumbing the link.
Note that booting and resuming from suspend are really just special
cases where one or more of the ++ cases appear to happen at once, as
the daemon will attempt to "de-queue" all pending events whenever it
starts or resumes (i.e., it will attempt to examine all pending events
before handling any of them). This will be part of the daemon's
"damping" to maximize stability (more on this below).
Likewise, shutdown and suspend are really just special cases where the --
events happen on all links at the same time.
Generally, the normal sequence for an interface being added to the system
would be link++, followed by network++, followed
by one or more ip++. When an interface is removed, there would
be a network-- followed by a link-- event; it
is quite likely that there would not be explicit ip-- events
initiating the process.
An interesting case would be when a network-- is not
followed by a link--. It could be a transient failure,
or it could be a move from one network to another (in which case a
network++ would follow at some point), or it could simply be
that that link is "gone," whether by admin choice or because of a longer-term
failure. Since the network is gone, we can do nothing with respect to it per
se. But we can start a timer, then once that timer "pops" (per the profile),
we might either reset all connections (if the number of networks is now
0) or try to get all services using the "dead" network to transition to
one of the other networks (if the number of networks is now ≥ 1).
Also note that when the timer pops, we set the state so that a
subsequent "network++" event follows the "there was no
previous network" path rather than the "there was a previous network"
path. But note that if we get a "network++" event before
the timer pops, and determine that the "new network" is the same as
the "old network", then we will attempt to "damp" the events out and
act as if neither event had occurred.
2.3 User-Driven Transitions
There are also user-driven transitions: whenever a user modifies the active
profile (of either layer), or activates a different profile (at either layer),
then the new active profile(s) may result in a transition. Depending on the
change(s), there may be nothing to do, or there may be minor reconfigurations
to make, or it may be that the user did the equivalent of pressing a giant
red "reset" button.
A note on "punchin" (the IPsec-based VPN which many of us use to access the
SWAN remotely): although tunnels coming and going should be detected by the
event handler and thus be handled by the profile daemon as an event-driven
transition, it would probably be better for us to work with the punchin team
to integrate our stuff together so that punching in and out would involve
using our interfaces, and thus be user-driven transitions, with the profile
daemon doing the heavy lifting instead of the punchin script.
2.4 Implications
It is not clear if these transitions suggest any sort of traditional simple
state model. E.g., the Zones model whose primary states are Configured,
Installed and Running seems impossibly simplistic for what we are trying
to achieve. Instead, it seems that we ought to come up with an abstract
representation of the network configuration, and that abstraction will become
the "state". Then whenever the users modifies the active profile or activates
a different profile, the network configuration will be changed accordingly,
as will our abstract "state". Likewise, whenever an event forces a
reconfiguration, the new configuration will be reflected in our abstract
"state".
3. Event Handler
The event handler must interface with the kernel, but will probably
mostly run in user-land.
The event handler will monitor several sources of information: hald, routing
socket, sysevents (and, longer term, link FMRIs); current thinking is that
this monitoring will take place within the "profile daemon" entity.
In addition to the monitoring component, work may be required in
the kernel to ensure that information is reported in a consistent
manner; hald back-end support may also need to be added (this will
benefit other projects as well as this one).
3.1 Information & Events
What information needs to be delivered? What events are we concerned with?
- Link creation/removal (signals that a link exists)
- Physical hardware: card removal / insertion (note that one
card may lead to multiple links being created -- e.g., qfe).
- This should cover both DR and PCMCIA/cardbus hot-plug
- Virtual links: creation / removal of:
- IP Tunnels
- Link aggregations
- VLANs
- Future: Crossbow VNICs?
- Link up/down (signals the link is or is not available for use)
- On wired networks: is the cable plugged in?
- On wireless networks: are we connected?
- Link health (signals how healthy the link is)
- On wireless networks: what is the signal strength?
- Future: error rate heuristics; failed links in an aggregation.
- Link availability (which networks are available)
- On wireless: available networks/APs
- On wired/wireless: IP address assigned
3.2 Obtaining Information
- link creation/removal
This covers both physical links (e.g., [un]plugging a NIC) and virtual
links (e.g. a tunnel, or an aggregation).
We can subscribe to event notifications using the sysevent user
subscription API; EC_DEV_ADD and EC_DEV_REMOVE
classes for subclass ESC_NETWORK should do the trick.
One possible complication: the sysevents report device names,
based on the device driver name.
Clearview vanity
names will introduce the notion of link names; we will need to
be able to map appropriately, as we will want to allow users to talk
about link names, not device names.
It has been suggested that we really should only care about the creation
of new links, and not devices. Clearview will add sysevents that
cover link creation/removal (refer to Clearview
UV design doc, §6.2.7).
- link up/down
DL_NOTE_LINK_UP/DOWN translates to toggling of
IFF_RUNNING which can be monitored on a routing socket.
Support for DL_NOTE_LINK_UP/DOWN is not consistent
across all drivers. We should plan to make sure that support is in
place for the most commonly used of Sun's drivers, either by doing
the work ourselves or by working with the driver teams to do it, and
we will still need to work with drivers that do not support
DL_NOTE_LINK_UP/DOWN.
For wireless drivers, the IFF_RUNNING flag should represent whether
or not there is a connection, either to an AP or an ad-hoc network.
However, it does not appear that this is part of the existing interface
to which WiFi drivers are being written/ported. We should look into
adding this to the interface; but if that is not possible, then the
connected state of a wireless driver should be queryable.
- link health/link availability: wireless-specific information
An earlier PSARC case (whose official title was WiFi PCMCIA
Driver Productization, although it is colloquially known as
wificonfig) defines a set of (unstable) wireless driver ioctls
that make up the interface with wificonfig. Those interfaces are
likely to change, though, as work in that area is going on right now.
The new library will be available for us, and will allow us to query
signal strength and request a scan for available networks.
This information will not be reported as an asynchronous event; the
event handler will need to query for this information. Preferred
behavior is always to query when wireless interfaces are present,
whether or not the interface is currently attached; if the scans
create performance problems, the rate may be reduced, or the scans
could be eliminated altogether.
There may also be roaming features (discussed in §3.5 below) which make querying for
available APs necessary even when connected.
- link availability: IP address
This is easy enough to get from a routing socket.
3.3 Delivering Information
How will the information be delivered to consumers?
This is probably a more detailed design question, as the current thinking
is that the event handler will be part of the profile daemon entity, our
primary consumer.
3.4 Storing Information
Will the information be stored anywhere? If so, when should
snapshots be taken? How many should be stored?
It is not clear that we need a repository at this point.
3.5 Roaming
Roaming can be broken up into two different layers: L2 and L3. L2 is the
case where there are multiple APs, on the same LAN; the IP address does not
change when migrating from one AP to another. This is something that hardware
implementations (either in the hardware itself or in the driver) seem to take
care of (based on casual Googling and toting OS X and Solaris laptops around
the building).
L3 is harder. MobileIP tries to make it transparent; that is something that
we might consider doing at some point under the NWAM umbrella, but will not
be part of phase 1 of the project. It has also been suggested that there
are some cases where simply making the switch, without worrying about keeping
existing connections alive, might be preferred behavior. This sort of gets
back to the question of intent we have discussed elsewhere: is the user just
surfing and checking mail, so switching to a cheaper network when it pops up
is painless; or does s/he have long-term ssh sessions that must stay up at
all costs?
3.6 How to respond to external configuration changes?
- Types of changes
There are two types of changes which an administrator can make: changes
which affect one or more of the Conditions, and changes which affect
one or more of the Profile Attributes. E.g., taking an interface down
via ifconfig would be an example of the former, while using svcadm to
disable the ntp/client service would be an example of the latter.
- Responding to these changes
Changes in the conditions need to be propagated: that is part of what
Network Auto-Magic is all about. So e.g., using ifconfig to take an
interface down would have the same effect as if a LAN cable had been
unplugged from the NIC corresponding to that interface, thus causing
a change in conditions, so the profile daemon would have to reevaluate
them, and possibly activate a different profile as a result.
Changes in state/configuration prescribed by profile attributes result
in inconsistencies with the profile. There are two problems with
this: (1) the inconsistency might confuse users in the future, and
(2) the user might have made a (very expensive) mistake. Fixing (1)
seems to be outside the scope of NWAM. NWAM seems to make fixing (2)
possible (e.g., by undoing the user's action), though it is not clear
that it is worth the additional complexity.
As to whether a given change affects a Condition, a Profile Attribute,
neither, or both, we will need to come up with a list of things which
we will need to monitor (e.g., network service states, network interface
states, etc.), and classify each, as part of the low-level design of
this project. But for now we suspect that there will be few if any
things in both lists, and if there are any, they can be handled as
special cases.
4. Profiles
Specification of network configuration is divided into two types of profiles.
The first is applied at the
link++ and
network++
stages of a link's life. It is made up of the conditions and attributes
required to determine which links should be used and how those links should
be configured. This will be referred to as the Link-Layer Profile.
The second is applied at the ip++ stage of a link's life, though
not every ip++ or ip-- transition will result in a
configuration change. It is made up of the conditions and attributes required
to configure higher-layer aspects of a link: things like name services,
firewall rules, or proxies. This part of the configuration is not dependent
on how IP connectivity is achieved, just that it exists on some set
of links. This will be referred to as the Upper-Layer Profile.
4.1 Link-Layer Profile
Links are configured individually, as
link++
and
network++ transitions occur (see
§2.2). A Link-Layer profile contains
attributes used to configure a link. A rule-set will determine
link/profile mappings; some of the types of mappings that should
be allowed are:
- profile foo applies to all links
- profile foo applies to all wired links, profile bar to all wireless links
- profile foo applies to bge2, profile bar applies to all other links
- profile foo applies to all wireless links; prefer pcwl0 over bcmndis0
The following attributes make up a Link-Layer profile:
- how to obtain IP address(es) for the link(s) in use
- Whether address(es) should come from DHCP, be statically
assigned, be link local, be auto-configured, etc. Might
be useful to be able to specify that an appropriate addr
should be chosen from a user-specified pool of addresses
(which would be available to multiple links).
- Multiple sources might be a possibility (nsswitch-like
mechanism?)
- If static, should be able to specify multiple addresses.
- whether or not a given link should be configured automatically
- Should this link be configured automatically, without user intervention.
- Default is true for wired interfaces, false for wireless interfaces.
- ESSIDs that can be joined automatically
- Should be generated automatically as knowledge of different
ESSIDs is acquired; user may also manually add to it.
- Will store authentication information for the ESSID, such as
security model and keys.
- Will have default ranking rules, but the user should be able
to modify these.
- relative priority vs. other Link-Layer profiles
- Stored as an integer, but might be presented to the user
differently.
- Used to decide which Link-Layer profile should be used; the system
will apply the profile with the highest priority among those
that have the necessary links available.
- May have multiple profiles at the same priority level; in
this case, as many as possible should be used (unless another
profile with higher priority is also available).
- Priority might be part of the rule-set used to select a profile,
rather than part of the profile itself.
- parallel interfaces to the same subnet (i.e., link aggregation
and IP Multipathing)
- This is an advanced feature which may be added down the road.
4.2 Upper-Layer Profile
A set of these properties and resources is applied to the system once
IP services are available (i.e. IP addresses are configured on running
interfaces). Unlike Link-Layer profiles, only one Upper-Layer profile
may be active at any time. Upper-Layer profile selection criteria include:
- specific link(s) that is(are) up
- wireless network/ESSID to which we have connected
- IP address range/subnet
- domain look-up (perhaps from DHCP server, since we might not have
configured name services yet)
- user input (user explicitly chooses a profile; if it cannot be done with
the currently available links, complain)
The following attributes make up an Upper-Layer profile:
- conditions under which this profile should be activated
- relative priority vs. other Upper-Layer profiles
- which name service(s) to use
- a host name (and any required variations thereof)
- routing information
- a set of IP Filter rules
- smf(5) services
- HTTP proxies
- "hook" mechanism: user-specified action taken when profile is activated
4.3 User Intent
An issue which has come up repeatedly during design discussions has been
that of "user intent". For example, laptops are used very differently
than servers and test servers may be used very differently than production
servers. So a knob to indicate this intent seems like a good idea.
The form this knob may take needs to be worked out.
5. Network Service Model
This section might also be called "how we interact with SMF".
5.1 Milestones
There will be two system network states, each of which is represented by
an SMF milestone service:
- milestone/network: basic network APIs are functional (i.e., applications
can make
socket() calls) and any configured packet filtering
is enabled on the available network interfaces.
- milestone/name-services: one or more name services (e.g., DNS, NIS,
NIS+, LDAP) are selected, each selected service is configured properly.
milestone/name-services should depend on milestone/network.
The profile daemon (profiled) will manipulate these two milestones.
When network events happen or the active profile changes (via any of
the mechanisms described in §2.2 and
§2.3), profiled will decide the
state to which each of these milestones should be set. When conditions
change sufficiently (again, see §2.2
and §2.3), milestone/network will
likely need to be refreshed. Network services should either depend
on milestone/network or milestone/name-services. We need to identify
what the complete lists are, and update their dependencies accordingly.
5.2 Profile Daemon
A new SMF service, network/profile, will start profiled. It should only
depend on network/pfil[1] and should be started very early. The current
network configuration services, network/physical, network/initial and
network/service should be removed and most of their tasks[2] taken over
by the profile daemon.
Based on the information from the active profile, the profile daemon may
also enable or disable some name services. For example, when the system
migrates from a profile which specifies using DNS to a profile which
specifies using NIS, profiled should disable dns/client and enable
nis/client.
| [1] |
network/pfil is an existing SMF network service. It autopushes
pfil onto the filtering interface and restricts network traffic
during startup. It does not depend on anything. |
| [2] |
Some of the pieces, such as ipqos and IPsec, need to be split out
into separate services. More details of this will be worked out
as we migrate from architecture to design. The IPsec part is
covered by Sun
bug 6185380; Sun
bug 5105194 covers the rest. |
5.3 More on Name Services
The
Sparks project has just gone live.
We are in contact with that project team and in fact several of their
members are subscribed to our nwam-discuss list, so as their work
progresses, our plans will evolve to match it, and this section will
be updated accordingly.
5.4 Diagnosability
Although improving diagnosability of network problems is not one of
our requirements, everyone agrees that we should at least not make
this problem worse. Our plans are not yet more defined than using
the
LOG_DAEMON facility and syslogging various messages
at various severities so that curious users can understand what profiled
does and why, but these plans should become more refined as we progress
from architecture to design.
5.5 To Restart or not to Restart?
A great deal of discussion has occurred regarding whether or not profiled
should be a delegated network restarter [see smf(5), smf_restarter(5),
smf_method(5), svc.startd(1M) and inetd(1M) for what this means]. At
present, our intention is that profiled will
not be such a restarter,
as we do not see sufficient upside to match the complexity downside. But
this may change as we move into design and prototyping.
6.Dependencies with the rest of the System
In order to tie the NWAM architecture into the overall system we need
to specify what other subsystems it interfaces with and what
requirements it drives on those systems.
6.1 Service Discovery
Bonjour is a
zero configuration technology developed and made popular by
Apple.
Gnome has support via
Avahi. Overall
this set of features is called
Service Discovery. Its main intersection with profiles will be in the assignment
of a hostname. The system will keep a set of aliases including a user provided hostname, a service discovery provided hostname, and a DHCP provided hostname allowing any to refer to the current node. NWAM will have to be able to detect when a name changes in order to be able to allow the user to propagate that to interested applications.
6.2 Virtualization
Another aspect of configuration is how we deal with the plethora of
virtualization technologies being developed. Virtualizations technologies
include:
- Zones are a basic virtualization technology that can be used independently or together with several of the other virtualization technologies.
- BrandZ is an attribute of a zone so
should be dealt with in the same manner as a zone.
- Xen creates enough separation between
the host OS and the guest OS that it is unlikely we will interact with
it much.
- Crossbow
- VNICs:
One feature that Crossbow will introduce is virtual NICs. This is an
independent construct from stack instances. As long as they act like real
NICs our design should work fine with them. The current administrative model
of VNICs is in flux. When that settles down the subset of parameters which
can be managed by NWAM will need to determined.
- Stack Instances:
Those VNICs can be bound to stack instances which can be associated with a
zone. The idea is that exclusive bindings between zone and stack instances
would create VNICs only in the zone. In that case the events have to be
generated in the zone. We would need something in the zone to consume those
events.
Currently the administrative model of zones is restrictive. Anything that
involves configuration has to be done in the global zone. Additionally neither
sysevents nor routing sockets (information sources) work in zones. That drives
NWAM to be mostly in the global zone. Future plans might change zones to be
able to administer resources given to them and to receive various events. In
this case a profile daemon could exist in a zone providing independent
management of its configuration.
6.3 HAL
HAL is a schema
and API for accessing information about the system. Initially it was
envisioned as the layer Gnome needed to abstract the hardware. Applications
of HAL are as diverse as the battery status meter and a
network manager.
There is an effort (to be documented soon on the OpenSolaris web site) to
get HAL into Solaris for use in volume management. HAL uses DBUS as its IPC
mechanism. There are system specific back-ends and an upper layer which also
just work.
HAL's
schema provides support for link-layer devices. While we could add entries to the HAL schema in order to meet our needs, it is not clear that this would be efficient. We have decided not to rely on HAL as a source of information for NWAM.
6.4 Visual Panels
Visual Panels are a new project aimed at
creating a better way to configure Solaris. The current demo includes a hook
for Network Profiles. Our UI design will have to reflect the need to integrate with Visual Panels.
6.5 Predictive Self Healing
Sun's Predictive Self-Healing features in Solaris are represented by
Fault Management (FMA)
and the
Service Manager (SMF;
section 5 of this document). These form an important part of Solaris' diagnosability story.
With SMF we investigated various levels of interface. These broke down to a
choice between a fairly coarse / high-level manipulation of SMF objects or
a finer / lower-level modeling of the system in SMF objects and subsequently
using SMF to drive many of the state changes. With a more detailed model of
the system in SMF we could use the diagnosability features of SMF (svcs
-x) to help users determine what is wrong with their system. Initially
we thought modeling an interface per SMF instance would not be possible so we
discussed using properties to store interface information. But upon discussion
with the SMF team we discovered that such modeling had been considered when
SMF was designed.
The next question was to see what value we could give to our users by
implementing a finer SMF model. There were 3 classes of errors we discussed:
- service configuration errors (e.g.
/etc/ssh/sshd_config)
- local net errors (e.g. ethernet cable unplugged)
- errors on the network (DNS server misbehaving)
The first case is already covered by much of what we have. Most
services are managed by SMF. Even in our coarser model those services
would still be managed by SMF.
The second class could be modeled in SMF. We could not come up with any examples where modeling in SMF helped diagnose problems any better than the current mechanisms.
The third class is the interesting one. There are many tough problems
in this DNS example alone, from trying to figure out any of the problems
which might have come from disruptions in network connectivity outside
of the user's network (e.g. making the server unreachable), to remote DNS
server problems (e.g. making response slow or non-existent for some subset
of the namespace). Due to the way that most modern networking is designed
and specifically how IP works, these kinds of problems do not trace back to
an interface in any useful manner. From having a single interface onto a
backbone of many paths to having redundant and/or independent interfaces
onto the network, the interface that sends out the request is not related
to most of the problems those packets can encounter. Instead the model
would really need to contain objects which are external to the box (routers,
servers, services). This problem does not seem to be tractable.
Given that the current model suffices for NWAM and we could not find
interesting diagnosibility problems that a finer model would help the
user solve, we decided to use the coarser model with SMF.
FMA models system components so that faults can be centrally managed. The control mechanism for FMA is provided by fmd(1M). One way to do this for
the networking subsystem would be putting the MAC layers in a hardware class
(hc:/...) and having administrator named data-link objects to
which they map. Tools would be provided to set up relationships and control
these devices. Clearview provides much of this
functionality other than the FMA objects.
In attempting to find uses for MAC level FMA objects we run into the same issues we did with SMF. The interesting problem are not caused by local objects (e.g. interfaces) but instead by things on the network. At this time we do not
think building MAC level FMA functionality would add to this project.
6.6 Install
NWAM will interact with install by providing tools that install can use to modify individual system parameters (e.g. hostname) and to configure basic networking. A document discussing the install strategy is available
here.
Appendix B: Revision History
| Revision |
Date |
Changes |
| 0.1 |
2006-Feb-13 |
initial draft |
| 0.1.1 |
2006-Feb-15 |
minor clarifications |
| 0.1.2 |
2006-Feb-15 |
minor clarifications |
| 0.2 |
2006-Feb-17 |
Section 1 organized & clarified, conditions introduced |
| 0.2.1 |
2006-Feb-23 |
Section 2 organized & clarified, other minor clarifications |
| 0.2.2 |
2006-Feb-24 |
Section 3 organized & clarified |
| 0.2.3 |
2006-Feb-27 |
Appendix A added, ¶ added to §3.1,
§3.5 moved to §6.5 |
| 0.2.4 |
2006-Mar-06 |
§3.1 and §3.2 updated, § 3.5 added |
| 0.2.5 |
2006-Mar-09 |
§6.1 and §6.2 updated |
| 0.3 |
2006-Mar-20 |
Section 5 organized & clarified |
| 0.3.1 |
2006-Mar-28 |
Modified §3.2 & §3.5; added §3.6 |
| 0.3.2 |
2006-Mar-28 |
Added Section 7 |
| 0.3.3 |
2006-Mar-28 |
Modified §6.1 |
| 0.3.4 |
2006-Mar-28 |
Modified §6.2 |
| 0.3.5 |
2006-Mar-29 |
Modified §6.3 |
| 0.3.6 |
2006-Mar-29 |
Added §6.6 |
| 0.3.7 |
2006-Mar-29 |
Modified §6.4 |
| 0.4 |
2006-Mar-29 |
Significantly modified §2.2 |
| 0.4.1 |
2006-Mar-30 |
Modified §6.2, §6.6 and Section 7 |
| 0.4.2 |
2006-Mar-31 |
Glossary moved from Section 7 to Appendix A
Revision History moved from Appendix A to B |
| 0.5 |
2006-Mar-31 |
Significantly modified §1.1, §1.3, §1.4, §2.1,
§2.3 & §4
Link-Layer and Upper-Layer profiles introduced |
| 0.5.1 |
2006-Apr-03 |
Significantly modified §6.5
Changed § name from FMA to Predictive Self-Healing |
| 1.0 |
2006-Apr-03 |
Architecture complete |
| 1.0.1 |
2006-Apr-04 |
Revised §5.3 per Sparks project going live |
| 1.0.2 |
2006-Apr-05 |
Revised §6.5 Dave B's comments on Predictive Self Healing |
| 1.0.3 |
2006-Apr-05 |
Added introduction to §4.1 |
| 1.0.4 |
2006-Apr-18 |
Changed Section 4 name from Configuration to Profiles |
| 1.0.5 |
2006-Jul-27 |
Added Renee's blog URL to team list |
| 1.0.6 |
2006-Aug-30 |
Moved Glossary to its own page |
| 2.0 |
2007-Feb-16 |
Section 0 added to point out differences between Architecture &
Design |