OpenSolaris

You are not signed in. Sign in or register.

Network Auto-Magic Architecture

Version 2.0, 2007-Feb-16
John Beck Jim Carlson Renee Danson Michael Hunter
Anay Panvalkar Kacheong Poon Garima Tripathi Jan Xie

There are six focus areas described below:

  1. Overview & Component Interaction
  2. State Machine
  3. Event Handler
  4. Profiles
  5. Network Service Model
  6. Dependencies with the rest of the System
There are also two appendices:
  1. Glossary
  2. Revision History

0. Architecture vs. Design

This Architecture document was completed in April 2006, then used as the basis for the Design Document which was completed in February 2007. Most of the high-level plans laid out here can be found in a similar lower-level form there, though as one might expect, some changes were made as part of the design process:
  • Link-Layer Profiles (LLPs) were renamed to Network Configuration Profiles (NCPs), and their form changed somewhat. LLPs were per-link, whereas an NCP describes the entire system, and is made up of NCUs (U == Units), where each NCU describes a link or an IP interface.
  • Upper-Layer Profiles (ULPs) were renamed to Environments, but were otherwise little changed.
  • The State Machine / Event Handler is largely similar at a high level, but its functional details are rather different.
  • Our Network Service Model is largely similar, but again, some details have changed.
Keep these changes in mind when reading this document.

1. Overview & Component Interaction

1.1 Introduction

Network Profiles, the primary component of the Network Auto-Magic project, are a way to simplify network configuration management. They work by allowing users to specify various properties which determine how things work in different circumstances. The properties include, but are not limited to:
  • Link-Layer
    • which network interface(s) to use
    • how to obtain IP address(es) for the interface(s) in use
    • whether or not a given link should be configured automatically
    • parallel interfaces to the same subnet (i.e., link aggregation and IP Multipathing)
    • the relative priority vs. other Link-Layer profiles
  • Upper-Layer
    • conditions under which this profile should be activated
    • which name service(s) to use
    • a host name (and any required variations thereof)
    • routing information
    • a set of IP filter rules
    • smf(5) services
    • user-specified post-activation "hook"
    • the relative priority vs. other Upper-Layer profiles
Note that this dual-layer model was chosen to support "overlay" profiles, as discussed in our Story Boards and Requirements. Examples are provided below to illustrate this.

1.2 Overview

Let us begin with an architectural overview. The primary components are:
  • The profile repository. This is where the configuration program stores its data, which will also be read by the profile daemon.
  • The profile configuration program (a.k.a. the UI).
    • Note that there will be both CLI and GUI versions of this program which will perform similar if not identical tasks.
    • In addition to using the repository, it also interacts with the profile daemon.
    • Tasks which users will use this program to perform include:
      • creating, modifying and deleting profiles
      • activating one or more profiles
      • querying information about profiles
  • The profile daemon.
    • This reads data from the repository.
    • It reacts to events as notified by the event handler.
    • It reacts to changes which users make via the configuration program.
    • The "state machine" described in Section 2 is implemented in this daemon.
    • The daemon also interacts with the SMF network services.
  • The event handler. This will likely have at least some kernel component, which will report information about events. A user-land component will gather this information and report it to the profile daemon. The user-land component may exist within the profile daemon itself.
  • The SMF network services. These are already part of Solaris, but we expect to modify them to some extent. The daemon will restart / refresh some of these services as needed.

1.3 Interactions

How they interact is roughly as follows:
  • At any given time, one or more Link-Layer profiles and exactly one Upper-Layer profile are "active".
  • At boot, the profile daemon consults the repository for the current active Link-Layer profile(s), proceeds until one or more IP addresses have been configured, checks the conditions of the Upper-Layer profiles, activates the highest priority one whose conditions match, and configures the network(s) accordingly. It is not yet clear whether:
    • the active profile(s) is/are always persistent across reboots
    or
    • there may be support for temporarily active profiles which do not persist across reboots
  • As events occur which may trigger a change in the network configuration, the event handler detects these and notifies the daemon accordingly. The daemon in turn consults the active profiles and may reconfigure the network(s) accordingly. Note that some of these events may indicate that the conditions have changed.
  • A change in conditions may trigger activating a different Link-Layer profile, which may in turn trigger a change in the Upper-Layer profile, which may in turn affect the network configuration. A change in conditions may also trigger activating a different Upper-Layer profile directly, without changing the underlying Link-Layer profile(s).
  • If a user modifies a profile, the configuration program updates the repository and notifies the daemon. If the current active profile is modified, then the daemon may reconfigure the network(s) accordingly.
  • Likewise, if a user activates a new profile (at either layer), then the configuration program updates the repository and notifies the daemon, which may then reconfigure the network(s) accordingly. Note that a user can always manually activate a profile (at either layer), regardless of conditions. Also note that users who desire total control will be able to specify conditions such that a different profile is never activated automatically.

1.4 Examples

  • There will likely be an out-of-the box pair of profiles for "no network" which specify files for everything in /etc/nsswitch.conf, disables services which make no sense in a stand-alone environment, etc. Then at boot, the profile daemon would consult the conditions, note that there was no networking, then automatically select that pair of profiles.
  • A user at Sun might specify an Upper-Layer "SWAN" profile:
    • conditions of the form "apply when a wired network with IP addresses in the range 129.144.0.0/12 is detected"
    • a property to use name server X
    • a property to use files/dns/nis or files/nis in /etc/nsswitch.conf
    • a property to use NIS server Y
    • enable the SMF service nis/client
    • etc.
  • A user might specify a Link-Layer profile which specifies "when I detect a WLAN on interface bcmndis0 with ESSID X and BSSID Y, then use DHCP to get an IP address", then a related Upper-Layer profile whose conditions activate it contingent upon the above, then have a user "hook" to punchin, which in turn activates the punchin Link-Layer profile, which creates an IPsec tunnel, which ultimately leads to the Upper-Layer "SWAN" profile being activated.

2. State Machine

One of our focus areas is "State Machine", which needs to cover both the abstract set of states for the profile daemon, and the set of possible transitions between those states. For now, we will focus on the transitions, with the idea that sufficiently specifying the transitions may suggest what the states themselves should be.

2.1 Initial Conditions

Since the new network/profile service will be replacing the existing network/physical and network/initial services (see §5.2), all of the start-up functionality of those two services will need to be accounted for. Though the state machine is designed to make bringing up an interface at boot time as much as possible like bringing up an interface on a running system, there will be a few initialization tasks required:
  • Plumbing of loopback interfaces.
  • Walking the device tree to find existing network links. During normal operation, we would receive notification of the arrival of a new link; this notification kicks off the process of bringing up that link. At boot time, those event notifications will not occur, so we will have to find all existing links and "manually" kick off the bring-up of those links.

2.2 Event-Driven Transitions

A transition is made up of a series of reactions to a given event. These events come in pairs: some new thing is available, some old thing is no longer available. There are three event pairs which frame the "life-cycle" of a link:
  • link is created: link++

    A new link has been added to the system. Common possible reasons:

    • a NIC is hot-plugged/DR inserted
    • a new tunnel has been configured

    The system reacts by:

    • plumbing the link (if necessary)
    • if the link is wireless, attempting to connect: scan for APs, consult list of "known" ESSIDs for matches (may also have to explicitly try connecting to user-specified ESSIDs that do not advertise). If an available ESSID matches one on our list, connect. If not (depending on policy), present available ESSIDs to user for selection.

  • link comes up (RUNNING flag is set): network++

    A link has become available for use. Common possible reasons:

    • a LAN cable is plugged in
    • a wireless link has connected to an AP

    The system reacts by

    • Gathering information about the network: DHCP server availability, VLANs and/or subnets present
    • Consulting the Link-Layer profile to determine if this link should be configured. If no, note its availability, but leave link down and do nothing else; if yes, bring link up.

  • link gets IP address: ip++

    An IP address has been assigned to the link. Common possible reasons:

    • DHCP lease has been obtained
    • IPv6 stateless address autoconf has completed
    • a link-local address has been configured
    • a static address has been assigned

    The system reacts by:

    • Consulting Upper-Layer profiles to decide what higher-layer configuration is required; this could include applying completely new configuration (for name services, ipfilter, etc.), restarting existing services, or possibly doing nothing.
    • Activating new configuration as needed.

  • link loses IP address: ip--

    An IP address has been removed from the link. Common possible reasons:

    • DHCP lease has expired
    • an autoconfigured address has timed out
    • a link-local address has been removed
    • a static address has been removed

    The system reacts by:

    • Consulting Upper-Layer profiles to decide what higher-layer configuration changes, if any, are required.
    • Activating new configuration as needed.

  • link goes down: network--

    A link is no longer available for use. Common possible reasons:

    • a LAN cable is unplugged
    • a connection with an AP is lost

    The system reacts by:

    • If link was not in use (UP flag is off), just deleting its info from our list of available links.
    • Consulting Link-Layer profile to determine if other RUNNING links should be brought up, and then following the rest of the network++ process for those links as needed.

  • link is removed: link--

    A link has been removed from the system. Common possible reasons:

    • a NIC is unplugged/DR removed
    • an existing tunnel has been torn down

    The system reacts by:

    • If link was up (RUNNING), taking it down (this will cause "link goes down" steps to take place).
    • Unplumbing the link.
Note that booting and resuming from suspend are really just special cases where one or more of the ++ cases appear to happen at once, as the daemon will attempt to "de-queue" all pending events whenever it starts or resumes (i.e., it will attempt to examine all pending events before handling any of them). This will be part of the daemon's "damping" to maximize stability (more on this below).

Likewise, shutdown and suspend are really just special cases where the -- events happen on all links at the same time.

Generally, the normal sequence for an interface being added to the system would be link++, followed by network++, followed by one or more ip++. When an interface is removed, there would be a network-- followed by a link-- event; it is quite likely that there would not be explicit ip-- events initiating the process.

An interesting case would be when a network-- is not followed by a link--. It could be a transient failure, or it could be a move from one network to another (in which case a network++ would follow at some point), or it could simply be that that link is "gone," whether by admin choice or because of a longer-term failure. Since the network is gone, we can do nothing with respect to it per se. But we can start a timer, then once that timer "pops" (per the profile), we might either reset all connections (if the number of networks is now 0) or try to get all services using the "dead" network to transition to one of the other networks (if the number of networks is now ≥ 1). Also note that when the timer pops, we set the state so that a subsequent "network++" event follows the "there was no previous network" path rather than the "there was a previous network" path. But note that if we get a "network++" event before the timer pops, and determine that the "new network" is the same as the "old network", then we will attempt to "damp" the events out and act as if neither event had occurred.

2.3 User-Driven Transitions

There are also user-driven transitions: whenever a user modifies the active profile (of either layer), or activates a different profile (at either layer), then the new active profile(s) may result in a transition. Depending on the change(s), there may be nothing to do, or there may be minor reconfigurations to make, or it may be that the user did the equivalent of pressing a giant red "reset" button.
    A note on "punchin" (the IPsec-based VPN which many of us use to access the SWAN remotely): although tunnels coming and going should be detected by the event handler and thus be handled by the profile daemon as an event-driven transition, it would probably be better for us to work with the punchin team to integrate our stuff together so that punching in and out would involve using our interfaces, and thus be user-driven transitions, with the profile daemon doing the heavy lifting instead of the punchin script.

    2.4 Implications

    It is not clear if these transitions suggest any sort of traditional simple state model. E.g., the Zones model whose primary states are Configured, Installed and Running seems impossibly simplistic for what we are trying to achieve. Instead, it seems that we ought to come up with an abstract representation of the network configuration, and that abstraction will become the "state". Then whenever the users modifies the active profile or activates a different profile, the network configuration will be changed accordingly, as will our abstract "state". Likewise, whenever an event forces a reconfiguration, the new configuration will be reflected in our abstract "state".

    3. Event Handler

    The event handler must interface with the kernel, but will probably mostly run in user-land.
    The event handler will monitor several sources of information: hald, routing socket, sysevents (and, longer term, link FMRIs); current thinking is that this monitoring will take place within the "profile daemon" entity.

    In addition to the monitoring component, work may be required in the kernel to ensure that information is reported in a consistent manner; hald back-end support may also need to be added (this will benefit other projects as well as this one).

    3.1 Information & Events

    What information needs to be delivered? What events are we concerned with?
    • Link creation/removal (signals that a link exists)
      • Physical hardware: card removal / insertion (note that one card may lead to multiple links being created -- e.g., qfe).
        • This should cover both DR and PCMCIA/cardbus hot-plug
      • Virtual links: creation / removal of:
        • IP Tunnels
        • Link aggregations
        • VLANs
        • Future: Crossbow VNICs?
    • Link up/down (signals the link is or is not available for use)
      • On wired networks: is the cable plugged in?
      • On wireless networks: are we connected?
    • Link health (signals how healthy the link is)
      • On wireless networks: what is the signal strength?
      • Future: error rate heuristics; failed links in an aggregation.
    • Link availability (which networks are available)
      • On wireless: available networks/APs
      • On wired/wireless: IP address assigned

    3.2 Obtaining Information

    • link creation/removal

      This covers both physical links (e.g., [un]plugging a NIC) and virtual links (e.g. a tunnel, or an aggregation).

      We can subscribe to event notifications using the sysevent user subscription API; EC_DEV_ADD and EC_DEV_REMOVE classes for subclass ESC_NETWORK should do the trick.

      One possible complication: the sysevents report device names, based on the device driver name. Clearview vanity names will introduce the notion of link names; we will need to be able to map appropriately, as we will want to allow users to talk about link names, not device names.

      It has been suggested that we really should only care about the creation of new links, and not devices. Clearview will add sysevents that cover link creation/removal (refer to Clearview UV design doc, §6.2.7).

    • link up/down

      DL_NOTE_LINK_UP/DOWN translates to toggling of IFF_RUNNING which can be monitored on a routing socket.

      Support for DL_NOTE_LINK_UP/DOWN is not consistent across all drivers. We should plan to make sure that support is in place for the most commonly used of Sun's drivers, either by doing the work ourselves or by working with the driver teams to do it, and we will still need to work with drivers that do not support DL_NOTE_LINK_UP/DOWN.

      For wireless drivers, the IFF_RUNNING flag should represent whether or not there is a connection, either to an AP or an ad-hoc network. However, it does not appear that this is part of the existing interface to which WiFi drivers are being written/ported. We should look into adding this to the interface; but if that is not possible, then the connected state of a wireless driver should be queryable.

    • link health/link availability: wireless-specific information

      An earlier PSARC case (whose official title was WiFi PCMCIA Driver Productization, although it is colloquially known as wificonfig) defines a set of (unstable) wireless driver ioctls that make up the interface with wificonfig. Those interfaces are likely to change, though, as work in that area is going on right now. The new library will be available for us, and will allow us to query signal strength and request a scan for available networks.

      This information will not be reported as an asynchronous event; the event handler will need to query for this information. Preferred behavior is always to query when wireless interfaces are present, whether or not the interface is currently attached; if the scans create performance problems, the rate may be reduced, or the scans could be eliminated altogether.

      There may also be roaming features (discussed in §3.5 below) which make querying for available APs necessary even when connected.

    • link availability: IP address

      This is easy enough to get from a routing socket.

    3.3 Delivering Information

    How will the information be delivered to consumers?
    This is probably a more detailed design question, as the current thinking is that the event handler will be part of the profile daemon entity, our primary consumer.

    3.4 Storing Information

    Will the information be stored anywhere? If so, when should snapshots be taken? How many should be stored?
    It is not clear that we need a repository at this point.

    3.5 Roaming

    Roaming can be broken up into two different layers: L2 and L3. L2 is the case where there are multiple APs, on the same LAN; the IP address does not change when migrating from one AP to another. This is something that hardware implementations (either in the hardware itself or in the driver) seem to take care of (based on casual Googling and toting OS X and Solaris laptops around the building).
    L3 is harder. MobileIP tries to make it transparent; that is something that we might consider doing at some point under the NWAM umbrella, but will not be part of phase 1 of the project. It has also been suggested that there are some cases where simply making the switch, without worrying about keeping existing connections alive, might be preferred behavior. This sort of gets back to the question of intent we have discussed elsewhere: is the user just surfing and checking mail, so switching to a cheaper network when it pops up is painless; or does s/he have long-term ssh sessions that must stay up at all costs?

    3.6 How to respond to external configuration changes?

    • Types of changes

      There are two types of changes which an administrator can make: changes which affect one or more of the Conditions, and changes which affect one or more of the Profile Attributes. E.g., taking an interface down via ifconfig would be an example of the former, while using svcadm to disable the ntp/client service would be an example of the latter.

    • Responding to these changes

      Changes in the conditions need to be propagated: that is part of what Network Auto-Magic is all about. So e.g., using ifconfig to take an interface down would have the same effect as if a LAN cable had been unplugged from the NIC corresponding to that interface, thus causing a change in conditions, so the profile daemon would have to reevaluate them, and possibly activate a different profile as a result.

      Changes in state/configuration prescribed by profile attributes result in inconsistencies with the profile. There are two problems with this: (1) the inconsistency might confuse users in the future, and (2) the user might have made a (very expensive) mistake. Fixing (1) seems to be outside the scope of NWAM. NWAM seems to make fixing (2) possible (e.g., by undoing the user's action), though it is not clear that it is worth the additional complexity.

      As to whether a given change affects a Condition, a Profile Attribute, neither, or both, we will need to come up with a list of things which we will need to monitor (e.g., network service states, network interface states, etc.), and classify each, as part of the low-level design of this project. But for now we suspect that there will be few if any things in both lists, and if there are any, they can be handled as special cases.

    4. Profiles

    Specification of network configuration is divided into two types of profiles. The first is applied at the link++ and network++ stages of a link's life. It is made up of the conditions and attributes required to determine which links should be used and how those links should be configured. This will be referred to as the Link-Layer Profile.

    The second is applied at the ip++ stage of a link's life, though not every ip++ or ip-- transition will result in a configuration change. It is made up of the conditions and attributes required to configure higher-layer aspects of a link: things like name services, firewall rules, or proxies. This part of the configuration is not dependent on how IP connectivity is achieved, just that it exists on some set of links. This will be referred to as the Upper-Layer Profile.

    4.1 Link-Layer Profile

    Links are configured individually, as link++ and network++ transitions occur (see §2.2). A Link-Layer profile contains attributes used to configure a link. A rule-set will determine link/profile mappings; some of the types of mappings that should be allowed are:
    • profile foo applies to all links
    • profile foo applies to all wired links, profile bar to all wireless links
    • profile foo applies to bge2, profile bar applies to all other links
    • profile foo applies to all wireless links; prefer pcwl0 over bcmndis0
    The following attributes make up a Link-Layer profile:
    • how to obtain IP address(es) for the link(s) in use
      • Whether address(es) should come from DHCP, be statically assigned, be link local, be auto-configured, etc. Might be useful to be able to specify that an appropriate addr should be chosen from a user-specified pool of addresses (which would be available to multiple links).
      • Multiple sources might be a possibility (nsswitch-like mechanism?)
      • If static, should be able to specify multiple addresses.
    • whether or not a given link should be configured automatically
      • Should this link be configured automatically, without user intervention.
      • Default is true for wired interfaces, false for wireless interfaces.
    • ESSIDs that can be joined automatically
      • Should be generated automatically as knowledge of different ESSIDs is acquired; user may also manually add to it.
      • Will store authentication information for the ESSID, such as security model and keys.
      • Will have default ranking rules, but the user should be able to modify these.
    • relative priority vs. other Link-Layer profiles
      • Stored as an integer, but might be presented to the user differently.
      • Used to decide which Link-Layer profile should be used; the system will apply the profile with the highest priority among those that have the necessary links available.
      • May have multiple profiles at the same priority level; in this case, as many as possible should be used (unless another profile with higher priority is also available).
      • Priority might be part of the rule-set used to select a profile, rather than part of the profile itself.
    • parallel interfaces to the same subnet (i.e., link aggregation and IP Multipathing)
      • This is an advanced feature which may be added down the road.

    4.2 Upper-Layer Profile

    A set of these properties and resources is applied to the system once IP services are available (i.e. IP addresses are configured on running interfaces). Unlike Link-Layer profiles, only one Upper-Layer profile may be active at any time. Upper-Layer profile selection criteria include:
    • specific link(s) that is(are) up
    • wireless network/ESSID to which we have connected
    • IP address range/subnet
    • domain look-up (perhaps from DHCP server, since we might not have configured name services yet)
    • user input (user explicitly chooses a profile; if it cannot be done with the currently available links, complain)
    The following attributes make up an Upper-Layer profile:
    • conditions under which this profile should be activated
    • relative priority vs. other Upper-Layer profiles
    • which name service(s) to use
    • a host name (and any required variations thereof)
    • routing information
    • a set of IP Filter rules
    • smf(5) services
    • HTTP proxies
    • "hook" mechanism: user-specified action taken when profile is activated

    4.3 User Intent

    An issue which has come up repeatedly during design discussions has been that of "user intent". For example, laptops are used very differently than servers and test servers may be used very differently than production servers. So a knob to indicate this intent seems like a good idea. The form this knob may take needs to be worked out.

    5. Network Service Model

    This section might also be called "how we interact with SMF".

    5.1 Milestones

    There will be two system network states, each of which is represented by an SMF milestone service:
    • milestone/network: basic network APIs are functional (i.e., applications can make socket() calls) and any configured packet filtering is enabled on the available network interfaces.
    • milestone/name-services: one or more name services (e.g., DNS, NIS, NIS+, LDAP) are selected, each selected service is configured properly.
    milestone/name-services should depend on milestone/network.

    The profile daemon (profiled) will manipulate these two milestones. When network events happen or the active profile changes (via any of the mechanisms described in §2.2 and §2.3), profiled will decide the state to which each of these milestones should be set. When conditions change sufficiently (again, see §2.2 and §2.3), milestone/network will likely need to be refreshed. Network services should either depend on milestone/network or milestone/name-services. We need to identify what the complete lists are, and update their dependencies accordingly.

    5.2 Profile Daemon

    A new SMF service, network/profile, will start profiled. It should only depend on network/pfil[1] and should be started very early. The current network configuration services, network/physical, network/initial and network/service should be removed and most of their tasks[2] taken over by the profile daemon.

    Based on the information from the active profile, the profile daemon may also enable or disable some name services. For example, when the system migrates from a profile which specifies using DNS to a profile which specifies using NIS, profiled should disable dns/client and enable nis/client.

    [1] network/pfil is an existing SMF network service. It autopushes pfil onto the filtering interface and restricts network traffic during startup. It does not depend on anything.
    [2] Some of the pieces, such as ipqos and IPsec, need to be split out into separate services. More details of this will be worked out as we migrate from architecture to design. The IPsec part is covered by Sun bug 6185380; Sun bug 5105194 covers the rest.

    5.3 More on Name Services

    The Sparks project has just gone live. We are in contact with that project team and in fact several of their members are subscribed to our nwam-discuss list, so as their work progresses, our plans will evolve to match it, and this section will be updated accordingly.

    5.4 Diagnosability

    Although improving diagnosability of network problems is not one of our requirements, everyone agrees that we should at least not make this problem worse. Our plans are not yet more defined than using the LOG_DAEMON facility and syslogging various messages at various severities so that curious users can understand what profiled does and why, but these plans should become more refined as we progress from architecture to design.

    5.5 To Restart or not to Restart?

    A great deal of discussion has occurred regarding whether or not profiled should be a delegated network restarter [see smf(5), smf_restarter(5), smf_method(5), svc.startd(1M) and inetd(1M) for what this means]. At present, our intention is that profiled will not be such a restarter, as we do not see sufficient upside to match the complexity downside. But this may change as we move into design and prototyping.

    6.Dependencies with the rest of the System

    In order to tie the NWAM architecture into the overall system we need to specify what other subsystems it interfaces with and what requirements it drives on those systems.

    6.1 Service Discovery

    Bonjour is a zero configuration technology developed and made popular by Apple. Gnome has support via Avahi. Overall this set of features is called Service Discovery. Its main intersection with profiles will be in the assignment of a hostname. The system will keep a set of aliases including a user provided hostname, a service discovery provided hostname, and a DHCP provided hostname allowing any to refer to the current node. NWAM will have to be able to detect when a name changes in order to be able to allow the user to propagate that to interested applications.

    6.2 Virtualization

    Another aspect of configuration is how we deal with the plethora of virtualization technologies being developed. Virtualizations technologies include:
    • Zones are a basic virtualization technology that can be used independently or together with several of the other virtualization technologies.
    • BrandZ is an attribute of a zone so should be dealt with in the same manner as a zone.
    • Xen creates enough separation between the host OS and the guest OS that it is unlikely we will interact with it much.
    • Crossbow
      1. VNICs: One feature that Crossbow will introduce is virtual NICs. This is an independent construct from stack instances. As long as they act like real NICs our design should work fine with them. The current administrative model of VNICs is in flux. When that settles down the subset of parameters which can be managed by NWAM will need to determined.
      2. Stack Instances: Those VNICs can be bound to stack instances which can be associated with a zone. The idea is that exclusive bindings between zone and stack instances would create VNICs only in the zone. In that case the events have to be generated in the zone. We would need something in the zone to consume those events.

    Currently the administrative model of zones is restrictive. Anything that involves configuration has to be done in the global zone. Additionally neither sysevents nor routing sockets (information sources) work in zones. That drives NWAM to be mostly in the global zone. Future plans might change zones to be able to administer resources given to them and to receive various events. In this case a profile daemon could exist in a zone providing independent management of its configuration.

    6.3 HAL

    HAL is a schema and API for accessing information about the system. Initially it was envisioned as the layer Gnome needed to abstract the hardware. Applications of HAL are as diverse as the battery status meter and a network manager.

    There is an effort (to be documented soon on the OpenSolaris web site) to get HAL into Solaris for use in volume management. HAL uses DBUS as its IPC mechanism. There are system specific back-ends and an upper layer which also just work.

    HAL's schema provides support for link-layer devices. While we could add entries to the HAL schema in order to meet our needs, it is not clear that this would be efficient. We have decided not to rely on HAL as a source of information for NWAM.

    6.4 Visual Panels

    Visual Panels are a new project aimed at creating a better way to configure Solaris. The current demo includes a hook for Network Profiles. Our UI design will have to reflect the need to integrate with Visual Panels.

    6.5 Predictive Self Healing

    Sun's Predictive Self-Healing features in Solaris are represented by Fault Management (FMA) and the Service Manager (SMF; section 5 of this document). These form an important part of Solaris' diagnosability story.

    With SMF we investigated various levels of interface. These broke down to a choice between a fairly coarse / high-level manipulation of SMF objects or a finer / lower-level modeling of the system in SMF objects and subsequently using SMF to drive many of the state changes. With a more detailed model of the system in SMF we could use the diagnosability features of SMF (svcs -x) to help users determine what is wrong with their system. Initially we thought modeling an interface per SMF instance would not be possible so we discussed using properties to store interface information. But upon discussion with the SMF team we discovered that such modeling had been considered when SMF was designed.

    The next question was to see what value we could give to our users by implementing a finer SMF model. There were 3 classes of errors we discussed:

    • service configuration errors (e.g. /etc/ssh/sshd_config)
    • local net errors (e.g. ethernet cable unplugged)
    • errors on the network (DNS server misbehaving)

    The first case is already covered by much of what we have. Most services are managed by SMF. Even in our coarser model those services would still be managed by SMF.

    The second class could be modeled in SMF. We could not come up with any examples where modeling in SMF helped diagnose problems any better than the current mechanisms.

    The third class is the interesting one. There are many tough problems in this DNS example alone, from trying to figure out any of the problems which might have come from disruptions in network connectivity outside of the user's network (e.g. making the server unreachable), to remote DNS server problems (e.g. making response slow or non-existent for some subset of the namespace). Due to the way that most modern networking is designed and specifically how IP works, these kinds of problems do not trace back to an interface in any useful manner. From having a single interface onto a backbone of many paths to having redundant and/or independent interfaces onto the network, the interface that sends out the request is not related to most of the problems those packets can encounter. Instead the model would really need to contain objects which are external to the box (routers, servers, services). This problem does not seem to be tractable.

    Given that the current model suffices for NWAM and we could not find interesting diagnosibility problems that a finer model would help the user solve, we decided to use the coarser model with SMF.

    FMA models system components so that faults can be centrally managed. The control mechanism for FMA is provided by fmd(1M). One way to do this for the networking subsystem would be putting the MAC layers in a hardware class (hc:/...) and having administrator named data-link objects to which they map. Tools would be provided to set up relationships and control these devices. Clearview provides much of this functionality other than the FMA objects.

    In attempting to find uses for MAC level FMA objects we run into the same issues we did with SMF. The interesting problem are not caused by local objects (e.g. interfaces) but instead by things on the network. At this time we do not think building MAC level FMA functionality would add to this project.

    6.6 Install

    NWAM will interact with install by providing tools that install can use to modify individual system parameters (e.g. hostname) and to configure basic networking. A document discussing the install strategy is available here.

    Appendix A: Glossary

    Appendix B: Revision History

    Revision Date Changes
    0.1 2006-Feb-13 initial draft
    0.1.1 2006-Feb-15 minor clarifications
    0.1.2 2006-Feb-15 minor clarifications
    0.2 2006-Feb-17 Section 1 organized & clarified, conditions introduced
    0.2.1 2006-Feb-23 Section 2 organized & clarified, other minor clarifications
    0.2.2 2006-Feb-24 Section 3 organized & clarified
    0.2.3 2006-Feb-27 Appendix A added, ¶ added to §3.1, §3.5 moved to §6.5
    0.2.4 2006-Mar-06 §3.1 and §3.2 updated, § 3.5 added
    0.2.5 2006-Mar-09 §6.1 and §6.2 updated
    0.3 2006-Mar-20 Section 5 organized & clarified
    0.3.1 2006-Mar-28 Modified §3.2 & §3.5; added §3.6
    0.3.2 2006-Mar-28 Added Section 7
    0.3.3 2006-Mar-28 Modified §6.1
    0.3.4 2006-Mar-28 Modified §6.2
    0.3.5 2006-Mar-29 Modified §6.3
    0.3.6 2006-Mar-29 Added §6.6
    0.3.7 2006-Mar-29 Modified §6.4
    0.4 2006-Mar-29 Significantly modified §2.2
    0.4.1 2006-Mar-30 Modified §6.2, §6.6 and Section 7
    0.4.2 2006-Mar-31 Glossary moved from Section 7 to Appendix A
    Revision History moved from Appendix A to B
    0.5 2006-Mar-31 Significantly modified §1.1, §1.3, §1.4, §2.1, §2.3 & §4
    Link-Layer and Upper-Layer profiles introduced
    0.5.1 2006-Apr-03 Significantly modified §6.5
    Changed § name from FMA to Predictive Self-Healing
    1.0 2006-Apr-03 Architecture complete
    1.0.1 2006-Apr-04 Revised §5.3 per Sparks project going live
    1.0.2 2006-Apr-05 Revised §6.5 Dave B's comments on Predictive Self Healing
    1.0.3 2006-Apr-05 Added introduction to §4.1
    1.0.4 2006-Apr-18 Changed Section 4 name from Configuration to Profiles
    1.0.5 2006-Jul-27 Added Renee's blog URL to team list
    1.0.6 2006-Aug-30 Moved Glossary to its own page
    2.0 2007-Feb-16 Section 0 added to point out differences between Architecture & Design