aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorPatrick McHardy <kaber@trash.net>2013-03-31 18:10:34 +0200
committerPatrick McHardy <kaber@trash.net>2013-03-31 18:10:34 +0200
commit70711d223510ba1773cfe1d7770a56141c815ff8 (patch)
tree4a71f38a3a554ddecaa31b7d8c6bc49b7d1705b4 /Documentation/networking
parentd53b4ed072d9779cdf53582c46436dec06d0961f (diff)
parent19f949f52599ba7c3f67a5897ac6be14bfcb1200 (diff)
Merge tag 'v3.8' of /home/kaber/src/repos/linux
Linux 3.8 Signed-off-by: Patrick McHardy <kaber@trash.net> Conflicts: include/linux/Kbuild include/linux/netlink.h
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/batman-adv.txt15
-rw-r--r--Documentation/networking/bonding.txt36
-rw-r--r--Documentation/networking/bridge.txt13
-rw-r--r--Documentation/networking/caif/Linux-CAIF.txt91
-rw-r--r--Documentation/networking/can.txt186
-rw-r--r--Documentation/networking/ip-sysctl.txt155
-rw-r--r--Documentation/networking/netconsole.txt19
-rw-r--r--Documentation/networking/netdev-features.txt2
-rw-r--r--Documentation/networking/openvswitch.txt2
-rw-r--r--Documentation/networking/packet_mmap.txt246
-rw-r--r--Documentation/networking/s2io.txt14
-rw-r--r--Documentation/networking/stmmac.txt69
-rw-r--r--Documentation/networking/vxge.txt7
-rw-r--r--Documentation/networking/vxlan.txt47
14 files changed, 699 insertions, 203 deletions
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index 75a592365af..c1d82047a4b 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -75,9 +75,10 @@ folder:
There is a special folder for debugging information:
-# ls /sys/kernel/debug/batman_adv/bat0/
-# bla_claim_table log socket transtable_local
-# gateways originators transtable_global vis_data
+# ls /sys/kernel/debug/batman_adv/bat0/
+# bla_backbone_table log transtable_global
+# bla_claim_table originators transtable_local
+# gateways socket vis_data
Some of the files contain all sort of status information regard-
ing the mesh network. For example, you can view the table of
@@ -202,7 +203,8 @@ abled during run time. Following log_levels are defined:
2 - Enable messages related to route added / changed / deleted
4 - Enable messages related to translation table operations
8 - Enable messages related to bridge loop avoidance
-15 - enable all messages
+16 - Enable messaged related to DAT, ARP snooping and parsing
+31 - Enable all messages
The debug output can be changed at runtime using the file
/sys/class/net/bat0/mesh/log_level. e.g.
@@ -211,6 +213,11 @@ The debug output can be changed at runtime using the file
will enable debug messages for when routes change.
+Counters for different types of packets entering and leaving the
+batman-adv module are available through ethtool:
+
+# ethtool --statistics bat0
+
BATCTL
------
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index bfea8a33890..10a015c384b 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,22 @@ xmit_hash_policy
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
- generate the hash. The formula is
+ generate the hash. The IPv4 formula is
(((source IP XOR dest IP) AND 0xffff) XOR
( source MAC XOR destination MAC ))
modulo slave count
+ The IPv6 formula is
+
+ hash = (source ip quad 2 XOR dest IP quad 2) XOR
+ (source ip quad 3 XOR dest IP quad 3) XOR
+ (source ip quad 4 XOR dest IP quad 4)
+
+ (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+ XOR (source MAC XOR destination MAC))
+ modulo slave count
+
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
the formula is the same as for the layer2 transmit
@@ -778,19 +788,29 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.
- The formula for unfragmented TCP and UDP packets is
+ The formula for unfragmented IPv4 TCP and UDP packets is
((source port XOR dest port) XOR
((source IP XOR dest IP) AND 0xffff)
modulo slave count
- For fragmented TCP or UDP packets and all other IP
- protocol traffic, the source and destination port
+ The formula for unfragmented IPv6 TCP and UDP packets is
+
+ hash = (source port XOR dest port) XOR
+ ((source ip quad 2 XOR dest IP quad 2) XOR
+ (source ip quad 3 XOR dest IP quad 3) XOR
+ (source ip quad 4 XOR dest IP quad 4))
+
+ ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+ modulo slave count
+
+ For fragmented TCP or UDP packets and all other IPv4 and
+ IPv6 protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.
- This policy is intended to mimic the behavior of
+ The IPv4 policy is intended to mimic the behavior of
certain switches, notably Cisco switches with PFC2 as
well as some Foundry and IBM products.
@@ -1210,7 +1230,7 @@ options, you may wish to use the "max_bonds" module parameter,
documented above.
To create multiple bonding devices with differing options, it is
-preferrable to use bonding parameters exported by sysfs, documented in the
+preferable to use bonding parameters exported by sysfs, documented in the
section below.
For versions of bonding without sysfs support, the only means to
@@ -1950,7 +1970,7 @@ access to fail over to. Additionally, the bonding load balance modes
support link monitoring of their members, so if individual links fail,
the load will be rebalanced across the remaining devices.
- See Section 13, "Configuring Bonding for Maximum Throughput"
+ See Section 12, "Configuring Bonding for Maximum Throughput"
for information on configuring bonding with one peer device.
11.2 High Availability in a Multiple Switch Topology
@@ -2620,7 +2640,7 @@ be found at:
https://lists.sourceforge.net/lists/listinfo/bonding-devel
- Discussions regarding the developpement of the bonding driver take place
+ Discussions regarding the development of the bonding driver take place
on the main Linux network mailing list, hosted at vger.kernel.org. The list
address is:
diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt
index a7ba5e4e2c9..a27cb6214ed 100644
--- a/Documentation/networking/bridge.txt
+++ b/Documentation/networking/bridge.txt
@@ -1,7 +1,14 @@
In order to use the Ethernet bridging functionality, you'll need the
-userspace tools. These programs and documentation are available
-at http://www.linuxfoundation.org/en/Net:Bridge. The download page is
-http://prdownloads.sourceforge.net/bridge.
+userspace tools.
+
+Documentation for Linux bridging is on:
+ http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
+
+The bridge-utilities are maintained at:
+ git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git
+
+Additionally, the iproute2 utilities can be used to configure
+bridge devices.
If you still have questions, don't hesitate to post to the mailing list
(more info https://lists.linux-foundation.org/mailman/listinfo/bridge).
diff --git a/Documentation/networking/caif/Linux-CAIF.txt b/Documentation/networking/caif/Linux-CAIF.txt
index e52fd62bef3..0aa4bd381be 100644
--- a/Documentation/networking/caif/Linux-CAIF.txt
+++ b/Documentation/networking/caif/Linux-CAIF.txt
@@ -19,60 +19,36 @@ and host. Currently, UART and Loopback are available for Linux.
Architecture:
------------
The implementation of CAIF is divided into:
-* CAIF Socket Layer, Kernel API, and Net Device.
+* CAIF Socket Layer and GPRS IP Interface.
* CAIF Core Protocol Implementation
* CAIF Link Layer, implemented as NET devices.
RTNL
!
- ! +------+ +------+ +------+
- ! +------+! +------+! +------+!
- ! ! Sock !! !Kernel!! ! Net !!
- ! ! API !+ ! API !+ ! Dev !+ <- CAIF Client APIs
- ! +------+ +------! +------+
- ! ! ! !
- ! +----------!----------+
- ! +------+ <- CAIF Protocol Implementation
- +-------> ! CAIF !
- ! Core !
- +------+
- +--------!--------+
- ! !
- +------+ +-----+
- ! ! ! TTY ! <- Link Layer (Net Devices)
- +------+ +-----+
-
-
-Using the Kernel API
-----------------------
-The Kernel API is used for accessing CAIF channels from the
-kernel.
-The user of the API has to implement two callbacks for receive
-and control.
-The receive callback gives a CAIF packet as a SKB. The control
-callback will
-notify of channel initialization complete, and flow-on/flow-
-off.
-
-
- struct caif_device caif_dev = {
- .caif_config = {
- .name = "MYDEV"
- .type = CAIF_CHTY_AT
- }
- .receive_cb = my_receive,
- .control_cb = my_control,
- };
- caif_add_device(&caif_dev);
- caif_transmit(&caif_dev, skb);
-
-See the caif_kernel.h for details about the CAIF kernel API.
+ ! +------+ +------+
+ ! +------+! +------+!
+ ! ! IP !! !Socket!!
+ +-------> !interf!+ ! API !+ <- CAIF Client APIs
+ ! +------+ +------!
+ ! ! !
+ ! +-----------+
+ ! !
+ ! +------+ <- CAIF Core Protocol
+ ! ! CAIF !
+ ! ! Core !
+ ! +------+
+ ! +----------!---------+
+ ! ! ! !
+ ! +------+ +-----+ +------+
+ +--> ! HSI ! ! TTY ! ! USB ! <- Link Layer (Net Devices)
+ +------+ +-----+ +------+
+
I M P L E M E N T A T I O N
===========================
-===========================
+
CAIF Core Protocol Layer
=========================================
@@ -88,17 +64,13 @@ The Core CAIF implementation contains:
- Simple implementation of CAIF.
- Layered architecture (a la Streams), each layer in the CAIF
specification is implemented in a separate c-file.
- - Clients must implement PHY layer to access physical HW
- with receive and transmit functions.
- Clients must call configuration function to add PHY layer.
- Clients must implement CAIF layer to consume/produce
CAIF payload with receive and transmit functions.
- Clients must call configuration function to add and connect the
Client layer.
- When receiving / transmitting CAIF Packets (cfpkt), ownership is passed
- to the called function (except for framing layers' receive functions
- or if a transmit function returns an error, in which case the caller
- must free the packet).
+ to the called function (except for framing layers' receive function)
Layered Architecture
--------------------
@@ -109,11 +81,6 @@ Implementation. The support functions include:
CAIF Packet has functions for creating, destroying and adding content
and for adding/extracting header and trailers to protocol packets.
- - CFLST CAIF list implementation.
-
- - CFGLUE CAIF Glue. Contains OS Specifics, such as memory
- allocation, endianness, etc.
-
The CAIF Protocol implementation contains:
- CFCNFG CAIF Configuration layer. Configures the CAIF Protocol
@@ -128,7 +95,7 @@ The CAIF Protocol implementation contains:
control and remote shutdown requests.
- CFVEI CAIF VEI layer. Handles CAIF AT Channels on VEI (Virtual
- External Interface). This layer encodes/decodes VEI frames.
+ External Interface). This layer encodes/decodes VEI frames.
- CFDGML CAIF Datagram layer. Handles CAIF Datagram layer (IP
traffic), encodes/decodes Datagram frames.
@@ -170,7 +137,7 @@ The CAIF Protocol implementation contains:
+---------+ +---------+
! !
+---------+ +---------+
- | | | Serial |
+ | | | Serial |
| | | CFSERL |
+---------+ +---------+
@@ -186,24 +153,20 @@ In this layered approach the following "rules" apply.
layer->dn->transmit(layer->dn, packet);
-Linux Driver Implementation
+CAIF Socket and IP interface
===========================
-Linux GPRS Net Device and CAIF socket are implemented on top of the
-CAIF Core protocol. The Net device and CAIF socket have an instance of
+The IP interface and CAIF socket API are implemented on top of the
+CAIF Core protocol. The IP Interface and CAIF socket have an instance of
'struct cflayer', just like the CAIF Core protocol stack.
Net device and Socket implement the 'receive()' function defined by
'struct cflayer', just like the rest of the CAIF stack. In this way, transmit and
receive of packets is handled as by the rest of the layers: the 'dn->transmit()'
function is called in order to transmit data.
-The layer on top of the CAIF Core implementation is
-sometimes referred to as the "Client layer".
-
-
Configuration of Link Layer
---------------------------
-The Link Layer is implemented as Linux net devices (struct net_device).
+The Link Layer is implemented as Linux network devices (struct net_device).
Payload handling and registration is done using standard Linux mechanisms.
The CAIF Protocol relies on a loss-less link layer without implementing
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index ac295399f0d..820f55344ed 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -22,7 +22,8 @@ This file contains
4.1.2 RAW socket option CAN_RAW_ERR_FILTER
4.1.3 RAW socket option CAN_RAW_LOOPBACK
4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
- 4.1.5 RAW socket returned message flags
+ 4.1.5 RAW socket option CAN_RAW_FD_FRAMES
+ 4.1.6 RAW socket returned message flags
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
4.3 connected transport protocols (SOCK_SEQPACKET)
4.4 unconnected transport protocols (SOCK_DGRAM)
@@ -41,7 +42,8 @@ This file contains
6.5.1 Netlink interface to set/get devices properties
6.5.2 Setting the CAN bit-timing
6.5.3 Starting and stopping the CAN network device
- 6.6 supported CAN hardware
+ 6.6 CAN FD (flexible data rate) driver support
+ 6.7 supported CAN hardware
7 Socket CAN resources
@@ -232,16 +234,16 @@ solution for a couple of reasons:
arbitration problems and error frames caused by the different
ECUs. The occurrence of detected errors are important for diagnosis
and have to be logged together with the exact timestamp. For this
- reason the CAN interface driver can generate so called Error Frames
- that can optionally be passed to the user application in the same
- way as other CAN frames. Whenever an error on the physical layer
+ reason the CAN interface driver can generate so called Error Message
+ Frames that can optionally be passed to the user application in the
+ same way as other CAN frames. Whenever an error on the physical layer
or the MAC layer is detected (e.g. by the CAN controller) the driver
- creates an appropriate error frame. Error frames can be requested by
- the user application using the common CAN filter mechanisms. Inside
- this filter definition the (interested) type of errors may be
- selected. The reception of error frames is disabled by default.
- The format of the CAN error frame is briefly described in the Linux
- header file "include/linux/can/error.h".
+ creates an appropriate error message frame. Error messages frames can
+ be requested by the user application using the common CAN filter
+ mechanisms. Inside this filter definition the (interested) type of
+ errors may be selected. The reception of error messages is disabled
+ by default. The format of the CAN error message frame is briefly
+ described in the Linux header file "include/linux/can/error.h".
4. How to use Socket CAN
------------------------
@@ -273,7 +275,7 @@ solution for a couple of reasons:
struct can_frame {
canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */
- __u8 can_dlc; /* data length code: 0 .. 8 */
+ __u8 can_dlc; /* frame payload length in byte (0 .. 8) */
__u8 data[8] __attribute__((aligned(8)));
};
@@ -375,6 +377,51 @@ solution for a couple of reasons:
nbytes = sendto(s, &frame, sizeof(struct can_frame),
0, (struct sockaddr*)&addr, sizeof(addr));
+ Remark about CAN FD (flexible data rate) support:
+
+ Generally the handling of CAN FD is very similar to the formerly described
+ examples. The new CAN FD capable CAN controllers support two different
+ bitrates for the arbitration phase and the payload phase of the CAN FD frame
+ and up to 64 bytes of payload. This extended payload length breaks all the
+ kernel interfaces (ABI) which heavily rely on the CAN frame with fixed eight
+ bytes of payload (struct can_frame) like the CAN_RAW socket. Therefore e.g.
+ the CAN_RAW socket supports a new socket option CAN_RAW_FD_FRAMES that
+ switches the socket into a mode that allows the handling of CAN FD frames
+ and (legacy) CAN frames simultaneously (see section 4.1.5).
+
+ The struct canfd_frame is defined in include/linux/can.h:
+
+ struct canfd_frame {
+ canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */
+ __u8 len; /* frame payload length in byte (0 .. 64) */
+ __u8 flags; /* additional flags for CAN FD */
+ __u8 __res0; /* reserved / padding */
+ __u8 __res1; /* reserved / padding */
+ __u8 data[64] __attribute__((aligned(8)));
+ };
+
+ The struct canfd_frame and the existing struct can_frame have the can_id,
+ the payload length and the payload data at the same offset inside their
+ structures. This allows to handle the different structures very similar.
+ When the content of a struct can_frame is copied into a struct canfd_frame
+ all structure elements can be used as-is - only the data[] becomes extended.
+
+ When introducing the struct canfd_frame it turned out that the data length
+ code (DLC) of the struct can_frame was used as a length information as the
+ length and the DLC has a 1:1 mapping in the range of 0 .. 8. To preserve
+ the easy handling of the length information the canfd_frame.len element
+ contains a plain length value from 0 .. 64. So both canfd_frame.len and
+ can_frame.can_dlc are equal and contain a length information and no DLC.
+ For details about the distinction of CAN and CAN FD capable devices and
+ the mapping to the bus-relevant data length code (DLC), see chapter 6.6.
+
+ The length of the two CAN(FD) frame structures define the maximum transfer
+ unit (MTU) of the CAN(FD) network interface and skbuff data length. Two
+ definitions are specified for CAN specific MTUs in include/linux/can.h :
+
+ #define CAN_MTU (sizeof(struct can_frame)) == 16 => 'legacy' CAN frame
+ #define CANFD_MTU (sizeof(struct canfd_frame)) == 72 => CAN FD frame
+
4.1 RAW protocol sockets with can_filters (SOCK_RAW)
Using CAN_RAW sockets is extensively comparable to the commonly
@@ -383,7 +430,7 @@ solution for a couple of reasons:
defaults are set at RAW socket binding time:
- The filters are set to exactly one filter receiving everything
- - The socket only receives valid data frames (=> no error frames)
+ - The socket only receives valid data frames (=> no error message frames)
- The loopback of sent CAN frames is enabled (see chapter 3.2)
- The socket does not receive its own sent frames (in loopback mode)
@@ -434,7 +481,7 @@ solution for a couple of reasons:
4.1.2 RAW socket option CAN_RAW_ERR_FILTER
As described in chapter 3.4 the CAN interface driver can generate so
- called Error Frames that can optionally be passed to the user
+ called Error Message Frames that can optionally be passed to the user
application in the same way as other CAN frames. The possible
errors are divided into different error classes that may be filtered
using the appropriate error mask. To register for every possible
@@ -472,7 +519,69 @@ solution for a couple of reasons:
setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS,
&recv_own_msgs, sizeof(recv_own_msgs));
- 4.1.5 RAW socket returned message flags
+ 4.1.5 RAW socket option CAN_RAW_FD_FRAMES
+
+ CAN FD support in CAN_RAW sockets can be enabled with a new socket option
+ CAN_RAW_FD_FRAMES which is off by default. When the new socket option is
+ not supported by the CAN_RAW socket (e.g. on older kernels), switching the
+ CAN_RAW_FD_FRAMES option returns the error -ENOPROTOOPT.
+
+ Once CAN_RAW_FD_FRAMES is enabled the application can send both CAN frames
+ and CAN FD frames. OTOH the application has to handle CAN and CAN FD frames
+ when reading from the socket.
+
+ CAN_RAW_FD_FRAMES enabled: CAN_MTU and CANFD_MTU are allowed
+ CAN_RAW_FD_FRAMES disabled: only CAN_MTU is allowed (default)
+
+ Example:
+ [ remember: CANFD_MTU == sizeof(struct canfd_frame) ]
+
+ struct canfd_frame cfd;
+
+ nbytes = read(s, &cfd, CANFD_MTU);
+
+ if (nbytes == CANFD_MTU) {
+ printf("got CAN FD frame with length %d\n", cfd.len);
+ /* cfd.flags contains valid data */
+ } else if (nbytes == CAN_MTU) {
+ printf("got legacy CAN frame with length %d\n", cfd.len);
+ /* cfd.flags is undefined */
+ } else {
+ fprintf(stderr, "read: invalid CAN(FD) frame\n");
+ return 1;
+ }
+
+ /* the content can be handled independently from the received MTU size */
+
+ printf("can_id: %X data length: %d data: ", cfd.can_id, cfd.len);
+ for (i = 0; i < cfd.len; i++)
+ printf("%02X ", cfd.data[i]);
+
+ When reading with size CANFD_MTU only returns CAN_MTU bytes that have
+ been received from the socket a legacy CAN frame has been read into the
+ provided CAN FD structure. Note that the canfd_frame.flags data field is
+ not specified in the struct can_frame and therefore it is only valid in
+ CANFD_MTU sized CAN FD frames.
+
+ As long as the payload length is <=8 the received CAN frames from CAN FD
+ capable CAN devices can be received and read by legacy sockets too. When
+ user-generated CAN FD frames have a payload length <=8 these can be send
+ by legacy CAN network interfaces too. Sending CAN FD frames with payload
+ length > 8 to a legacy CAN network interface returns an -EMSGSIZE error.
+
+ Implementation hint for new CAN applications:
+
+ To build a CAN FD aware application use struct canfd_frame as basic CAN
+ data structure for CAN_RAW based applications. When the application is
+ executed on an older Linux kernel and switching the CAN_RAW_FD_FRAMES
+ socket option returns an error: No problem. You'll get legacy CAN frames
+ or CAN FD frames and can process them the same way.
+
+ When sending to CAN devices make sure that the device is capable to handle
+ CAN FD frames by checking if the device maximum transfer unit is CANFD_MTU.
+ The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall.
+
+ 4.1.6 RAW socket returned message flags
When using recvmsg() call, the msg->msg_flags may contain following flags:
@@ -527,7 +636,7 @@ solution for a couple of reasons:
rcvlist_all - list for unfiltered entries (no filter operations)
rcvlist_eff - list for single extended frame (EFF) entries
- rcvlist_err - list for error frames masks
+ rcvlist_err - list for error message frames masks
rcvlist_fil - list for mask/value filters
rcvlist_inv - list for mask/value filters (inverse semantic)
rcvlist_sff - list for single standard frame (SFF) entries
@@ -573,10 +682,13 @@ solution for a couple of reasons:
dev->type = ARPHRD_CAN; /* the netdevice hardware type */
dev->flags = IFF_NOARP; /* CAN has no arp */
- dev->mtu = sizeof(struct can_frame);
+ dev->mtu = CAN_MTU; /* sizeof(struct can_frame) -> legacy CAN interface */
- The struct can_frame is the payload of each socket buffer in the
- protocol family PF_CAN.
+ or alternative, when the controller supports CAN with flexible data rate:
+ dev->mtu = CANFD_MTU; /* sizeof(struct canfd_frame) -> CAN FD interface */
+
+ The struct can_frame or struct canfd_frame is the payload of each socket
+ buffer (skbuff) in the protocol family PF_CAN.
6.2 local loopback of sent frames
@@ -784,15 +896,41 @@ solution for a couple of reasons:
$ ip link set canX type can restart-ms 100
Alternatively, the application may realize the "bus-off" condition
- by monitoring CAN error frames and do a restart when appropriate with
- the command:
+ by monitoring CAN error message frames and do a restart when
+ appropriate with the command:
$ ip link set canX type can restart
- Note that a restart will also create a CAN error frame (see also
- chapter 3.4).
+ Note that a restart will also create a CAN error message frame (see
+ also chapter 3.4).
+
+ 6.6 CAN FD (flexible data rate) driver support
+
+ CAN FD capable CAN controllers support two different bitrates for the
+ arbitration phase and the payload phase of the CAN FD frame. Therefore a
+ second bittiming has to be specified in order to enable the CAN FD bitrate.
+
+ Additionally CAN FD capable CAN controllers support up to 64 bytes of
+ payload. The representation of this length in can_frame.can_dlc and
+ canfd_frame.len for userspace applications and inside the Linux network
+ layer is a plain value from 0 .. 64 instead of the CAN 'data length code'.
+ The data length code was a 1:1 mapping to the payload length in the legacy
+ CAN frames anyway. The payload length to the bus-relevant DLC mapping is
+ only performed inside the CAN drivers, preferably with the helper
+ functions can_dlc2len() and can_len2dlc().
+
+ The CAN netdevice driver capabilities can be distinguished by the network
+ devices maximum transfer unit (MTU):
+
+ MTU = 16 (CAN_MTU) => sizeof(struct can_frame) => 'legacy' CAN device
+ MTU = 72 (CANFD_MTU) => sizeof(struct canfd_frame) => CAN FD capable device
+
+ The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall.
+ N.B. CAN FD capable devices can also handle and send legacy CAN frames.
+
+ FIXME: Add details about the CAN FD controller configuration when available.
- 6.6 Supported CAN hardware
+ 6.7 Supported CAN hardware
Please check the "Kconfig" file in "drivers/net/can" to get an actual
list of the support CAN hardware. On the Socket CAN project website
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6f896b94abd..dbca6618208 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -30,16 +30,24 @@ neigh/default/gc_thresh3 - INTEGER
Maximum number of neighbor entries allowed. Increase this
when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
+ Default: 1024
neigh/default/unres_qlen_bytes - INTEGER
The maximum number of bytes which may be used by packets
queued for each unresolved address by other network layers.
(added in linux 3.3)
+ Setting negative value is meaningless and will return error.
+ Default: 65536 Bytes(64KB)
neigh/default/unres_qlen - INTEGER
The maximum number of packets which may be queued for each
unresolved address by other network layers.
(deprecated in linux 3.3) : use unres_qlen_bytes instead.
+ Prior to linux 3.3, the default value is 3 which may cause
+ unexpected packet loss. The current default value is calculated
+ according to default value of unres_qlen_bytes and true size of
+ packet.
+ Default: 31
mtu_expires - INTEGER
Time, in seconds, that cached PMTU information is kept.
@@ -48,12 +56,6 @@ min_adv_mss - INTEGER
The advertised MSS depends on the first hop route MTU, but will
never be lower than this setting.
-rt_cache_rebuild_count - INTEGER
- The per net-namespace route cache emergency rebuild threshold.
- Any net-namespace having its route cache rebuilt due to
- a hash bucket chain being too long more than this many times
- will have its route caching disabled
-
IP Fragmentation:
ipfrag_high_thresh - INTEGER
@@ -205,15 +207,16 @@ tcp_early_retrans - INTEGER
Default: 2
tcp_ecn - INTEGER
- Enable Explicit Congestion Notification (ECN) in TCP. ECN is only
- used when both ends of the TCP flow support it. It is useful to
- avoid losses due to congestion (when the bottleneck router supports
- ECN).
+ Control use of Explicit Congestion Notification (ECN) by TCP.
+ ECN is used only when both ends of the TCP connection indicate
+ support for it. This feature is useful in avoiding losses due
+ to congestion by allowing supporting routers to signal
+ congestion before having to drop packets.
Possible values are:
- 0 disable ECN
- 1 ECN enabled
- 2 Only server-side ECN enabled. If the other end does
- not support ECN, behavior is like with ECN disabled.
+ 0 Disable ECN. Neither initiate nor accept ECN.
+ 1 Always request ECN on outgoing connection attempts.
+ 2 Enable ECN when requested by incoming connections
+ but do not request ECN on outgoing connections.
Default: 2
tcp_fack - BOOLEAN
@@ -221,15 +224,14 @@ tcp_fack - BOOLEAN
The value is not used, if tcp_sack is not enabled.
tcp_fin_timeout - INTEGER
- Time to hold socket in state FIN-WAIT-2, if it was closed
- by our side. Peer can be broken and never close its side,
- or even died unexpectedly. Default value is 60sec.
- Usual value used in 2.2 was 180 seconds, you may restore
- it, but remember that if your machine is even underloaded WEB server,
- you risk to overflow memory with kilotons of dead sockets,
- FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
- because they eat maximum 1.5K of memory, but they tend
- to live longer. Cf. tcp_max_orphans.
+ The length of time an orphaned (no longer referenced by any
+ application) connection will remain in the FIN_WAIT_2 state
+ before it is aborted at the local end. While a perfectly
+ valid "receive only" state for an un-orphaned connection, an
+ orphaned connection in FIN_WAIT_2 state could otherwise wait
+ forever for the remote to close its end of the connection.
+ Cf. tcp_max_orphans
+ Default: 60 seconds
tcp_frto - INTEGER
Enables Forward RTO-Recovery (F-RTO) defined in RFC4138.
@@ -445,7 +447,9 @@ tcp_stdurg - BOOLEAN
tcp_synack_retries - INTEGER
Number of times SYNACKs for a passive TCP connection attempt will
be retransmitted. Should not be higher than 255. Default value
- is 5, which corresponds to ~180seconds.
+ is 5, which corresponds to 31seconds till the last retransmission
+ with the current initial RTO of 1second. With this the final timeout
+ for a passive TCP connection will happen after 63seconds.
tcp_syncookies - BOOLEAN
Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
@@ -468,10 +472,40 @@ tcp_syncookies - BOOLEAN
SYN flood warnings in logs not being really flooded, your server
is seriously misconfigured.
+tcp_fastopen - INTEGER
+ Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data
+ in the opening SYN packet. To use this feature, the client application
+ must use sendmsg() or sendto() with MSG_FASTOPEN flag rather than
+ connect() to perform a TCP handshake automatically.
+
+ The values (bitmap) are
+ 1: Enables sending data in the opening SYN on the client.
+ 2: Enables TCP Fast Open on the server side, i.e., allowing data in
+ a SYN packet to be accepted and passed to the application before
+ 3-way hand shake finishes.
+ 4: Send data in the opening SYN regardless of cookie availability and
+ without a cookie option.
+ 0x100: Accept SYN data w/o validating the cookie.
+ 0x200: Accept data-in-SYN w/o any cookie option present.
+ 0x400/0x800: Enable Fast Open on all listeners regardless of the
+ TCP_FASTOPEN socket option. The two different flags designate two
+ different ways of setting max_qlen without the TCP_FASTOPEN socket
+ option.
+
+ Default: 0
+
+ Note that the client & server side Fast Open flags (1 and 2
+ respectively) must be also enabled before the rest of flags can take
+ effect.
+
+ See include/net/tcp.h and the code for more details.
+
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 255. Default value
- is 5, which corresponds to ~180seconds.
+ is 6, which corresponds to 63seconds till the last retransmission
+ with the current initial RTO of 1second. With this the final timeout
+ for an active TCP connection attempt will happen after 127seconds.
tcp_timestamps - BOOLEAN
Enable timestamps as defined in RFC1323.
@@ -551,6 +585,25 @@ tcp_thin_dupack - BOOLEAN
Documentation/networking/tcp-thin.txt
Default: 0
+tcp_limit_output_bytes - INTEGER
+ Controls TCP Small Queue limit per tcp socket.
+ TCP bulk sender tends to increase packets in flight until it
+ gets losses notifications. With SNDBUF autotuning, this can
+ result in a large amount of packets queued in qdisc/device
+ on the local machine, hurting latency of other flows, for
+ typical pfifo_fast qdiscs.
+ tcp_limit_output_bytes limits the number of bytes on qdisc
+ or device to reduce artificial RTT/cwnd and reduce bufferbloat.
+ Note: For GSO/TSO enabled flows, we try to have at least two
+ packets in flight. Reducing tcp_limit_output_bytes might also
+ reduce the size of individual GSO packet (64KB being the max)
+ Default: 131072
+
+tcp_challenge_ack_limit - INTEGER
+ Limits number of Challenge ACK sent per second, as recommended
+ in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
+ Default: 100
+
UDP variables:
udp_mem - vector of 3 INTEGERs: min, pressure, max
@@ -857,9 +910,19 @@ accept_source_route - BOOLEAN
FALSE (host)
accept_local - BOOLEAN
- Accept packets with local source addresses. In combination with
- suitable routing, this can be used to direct packets between two
- local interfaces over the wire and have them accepted properly.
+ Accept packets with local source addresses. In combination
+ with suitable routing, this can be used to direct packets
+ between two local interfaces over the wire and have them
+ accepted properly.
+
+ rp_filter must be set to a non-zero value in order for
+ accept_local to have an effect.
+
+ default FALSE
+
+route_localnet - BOOLEAN
+ Do not consider loopback addresses as martian source or destination
+ while routing. This enables the use of 127/8 for local routing purposes.
default FALSE
rp_filter - INTEGER
@@ -1268,6 +1331,12 @@ force_tllao - BOOLEAN
race condition where the sender deletes the cached link-layer address
prior to receiving a response to a previous solicitation."
+ndisc_notify - BOOLEAN
+ Define mode for notification of address and device changes.
+ 0 - (default): do nothing
+ 1 - Generate unsolicited neighbour advertisements when device is brought
+ up or hardware address changes.
+
icmp/*:
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
@@ -1398,6 +1467,20 @@ path_max_retrans - INTEGER
Default: 5
+pf_retrans - INTEGER
+ The number of retransmissions that will be attempted on a given path
+ before traffic is redirected to an alternate transport (should one
+ exist). Note this is distinct from path_max_retrans, as a path that
+ passes the pf_retrans threshold can still be used. Its only
+ deprioritized when a transmission path is selected by the stack. This
+ setting is primarily used to enable fast failover mechanisms without
+ having to reduce path_max_retrans to a very low value. See:
+ http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ for details. Note also that a value of pf_retrans > path_max_retrans
+ disables this feature
+
+ Default: 0
+
rto_initial - INTEGER
The initial round trip timeout value in milliseconds that will be used
in calculating round trip times. This is the initial time interval
@@ -1445,6 +1528,20 @@ cookie_preserve_enable - BOOLEAN
Default: 1
+cookie_hmac_alg - STRING
+ Select the hmac algorithm used when generating the cookie value sent by
+ a listening sctp socket to a connecting client in the INIT-ACK chunk.
+ Valid values are:
+ * md5
+ * sha1
+ * none
+ Ability to assign md5 or sha1 as the selected alg is predicated on the
+ configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and
+ CONFIG_CRYPTO_SHA1).
+
+ Default: Dependent on configuration. MD5 if available, else SHA1 if
+ available, else none.
+
rcvbuf_policy - INTEGER
Determines if the receive buffer is attributed to the socket or to
association. SCTP supports the capability to create multiple
@@ -1457,7 +1554,7 @@ rcvbuf_policy - INTEGER
blocking.
1: rcvbuf space is per association
- 0: recbuf space is per socket
+ 0: rcvbuf space is per socket
Default: 0
diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt
index 8d022073e3e..2e9e0ae2cd4 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -51,8 +51,23 @@ Built-in netconsole starts immediately after the TCP stack is
initialized and attempts to bring up the supplied dev at the supplied
address.
-The remote host can run either 'netcat -u -l -p <port>',
-'nc -l -u <port>' or syslogd.
+The remote host has several options to receive the kernel messages,
+for example:
+
+1) syslogd
+
+2) netcat
+
+ On distributions using a BSD-based netcat version (e.g. Fedora,
+ openSUSE and Ubuntu) the listening port must be specified without
+ the -p switch:
+
+ 'nc -u -l -p <port>' / 'nc -u -l <port>' or
+ 'netcat -u -l -p <port>' / 'netcat -u -l <port>'
+
+3) socat
+
+ 'socat udp-recv:<port> -'
Dynamic reconfiguration:
========================
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index 4164f5c02e4..f310edec8a7 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -164,4 +164,4 @@ read the CRC recorded by the NIC on receipt of the packet.
This requests that the NIC receive all possible frames, including errored
frames (such as bad FCS, etc). This can be helpful when sniffing a link with
bad packets on it. Some NICs may receive more packets if also put into normal
-PROMISC mdoe.
+PROMISC mode.
diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt
index b8a048b8df3..8fa2dd1e792 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.txt
@@ -118,7 +118,7 @@ essentially like this, ignoring metadata:
Naively, to add VLAN support, it makes sense to add a new "vlan" flow
key attribute to contain the VLAN tag, then continue to decode the
encapsulated headers beyond the VLAN tag using the existing field
-definitions. With this change, an TCP packet in VLAN 10 would have a
+definitions. With this change, a TCP packet in VLAN 10 would have a
flow key much like this:
eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 1c08a4b0981..94444b152fb 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -3,9 +3,9 @@
--------------------------------------------------------------------------------
This file documents the mmap() facility available with the PACKET
-socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
-capture network traffic with utilities like tcpdump or any other that needs
-raw access to network interface.
+socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
+i) capture network traffic with utilities like tcpdump, ii) transmit network
+traffic, or any other that needs raw access to network interface.
You can find the latest version of this document at:
http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap
@@ -21,19 +21,18 @@ Please send your comments to
+ Why use PACKET_MMAP
--------------------------------------------------------------------------------
-In Linux 2.4/2.6 if PACKET_MMAP is not enabled, the capture process is very
-inefficient. It uses very limited buffers and requires one system call
-to capture each packet, it requires two if you want to get packet's
-timestamp (like libpcap always does).
+In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
+inefficient. It uses very limited buffers and requires one system call to
+capture each packet, it requires two if you want to get packet's timestamp
+(like libpcap always does).
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
configurable circular buffer mapped in user space that can be used to either
send or receive packets. This way reading packets just needs to wait for them,
most of the time there is no need to issue a single system call. Concerning
transmission, multiple packets can be sent through one system call to get the
-highest bandwidth.
-By using a shared buffer between the kernel and the user also has the benefit
-of minimizing packet copies.
+highest bandwidth. By using a shared buffer between the kernel and the user
+also has the benefit of minimizing packet copies.
It's fine to use PACKET_MMAP to improve the performance of the capture and
transmission process, but it isn't everything. At least, if you are capturing
@@ -41,7 +40,8 @@ at high speeds (this is relative to the cpu speed), you should check if the
device driver of your network interface card supports some sort of interrupt
load mitigation or (even better) if it supports NAPI, also make sure it is
enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
-supported by devices of your network.
+supported by devices of your network. CPU IRQ pinning of your network interface
+card can also be an advantage.
--------------------------------------------------------------------------------
+ How to use mmap() to improve capture process
@@ -87,9 +87,7 @@ the following process:
socket creation and destruction is straight forward, and is done
the same way with or without PACKET_MMAP:
-int fd;
-
-fd= socket(PF_PACKET, mode, htons(ETH_P_ALL))
+ int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
where mode is SOCK_RAW for the raw interface were link level
information can be captured or SOCK_DGRAM for the cooked
@@ -163,11 +161,23 @@ As capture, each frame contains two parts:
A complete tutorial is available at: http://wiki.gnu-log.net/
+By default, the user should put data at :
+ frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
+
+So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
+the beginning of the user data will be at :
+ frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
+
+If you wish to put user data at a custom offset from the beginning of
+the frame (for payload alignment with SOCK_RAW mode for instance) you
+can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
+to make this work it must be enabled previously with setsockopt()
+and the PACKET_TX_HAS_OFF option.
+
--------------------------------------------------------------------------------
+ PACKET_MMAP settings
--------------------------------------------------------------------------------
-
To setup PACKET_MMAP from user level code is done with a call like
- Capture process
@@ -201,7 +211,6 @@ indeed, packet_set_ring checks that the following condition is true
frames_per_block * tp_block_nr == tp_frame_nr
-
Lets see an example, with the following values:
tp_block_size= 4096
@@ -227,7 +236,6 @@ be spawned across two blocks, so there are some details you have to take into
account when choosing the frame_size. See "Mapping and use of the circular
buffer (ring)".
-
--------------------------------------------------------------------------------
+ PACKET_MMAP setting constraints
--------------------------------------------------------------------------------
@@ -264,7 +272,6 @@ User space programs can include /usr/include/sys/user.h and
The pagesize can also be determined dynamically with the getpagesize (2)
system call.
-
Block number limit
--------------------
@@ -284,7 +291,6 @@ called pg_vec, its size limits the number of blocks that can be allocated.
v block #2
block #1
-
kmalloc allocates any number of bytes of physically contiguous memory from
a pool of pre-determined sizes. This pool of memory is maintained by the slab
allocator which is at the end the responsible for doing the allocation and
@@ -299,7 +305,6 @@ pointers to blocks is
131072/4 = 32768 blocks
-
PACKET_MMAP buffer size calculator
------------------------------------
@@ -340,7 +345,6 @@ and a value for <frame size> of 2048 bytes. These parameters will yield
and hence the buffer will have a 262144 MiB size. So it can hold
262144 MiB / 2048 bytes = 134217728 frames
-
Actually, this buffer size is not possible with an i386 architecture.
Remember that the memory is allocated in kernel space, in the case of
an i386 kernel's memory size is limited to 1GiB.
@@ -372,7 +376,6 @@ the following (from include/linux/if_packet.h):
- Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
- Pad to align to TPACKET_ALIGNMENT=16
*/
-
The following are conditions that are checked in packet_set_ring
@@ -413,7 +416,6 @@ and the following flags apply:
#define TP_STATUS_LOSING 4
#define TP_STATUS_CSUMNOTREADY 8
-
TP_STATUS_COPY : This flag indicates that the frame (and associated
meta information) has been truncated because it's
larger than tp_frame_size. This packet can be
@@ -462,7 +464,6 @@ packets are in the ring:
It doesn't incur in a race condition to first check the status value and
then poll for frames.
-
++ Transmission process
Those defines are also used for transmission:
@@ -494,6 +495,196 @@ The user can also use poll() to check if a buffer is available:
retval = poll(&pfd, 1, timeout);
-------------------------------------------------------------------------------
++ What TPACKET versions are available and when to use them?
+-------------------------------------------------------------------------------
+
+ int val = tpacket_version;
+ setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
+ getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
+
+where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
+
+TPACKET_V1:
+ - Default if not otherwise specified by setsockopt(2)
+ - RX_RING, TX_RING available
+ - VLAN metadata information available for packets
+ (TP_STATUS_VLAN_VALID)
+
+TPACKET_V1 --> TPACKET_V2:
+ - Made 64 bit clean due to unsigned long usage in TPACKET_V1
+ structures, thus this also works on 64 bit kernel with 32 bit
+ userspace and the like
+ - Timestamp resolution in nanoseconds instead of microseconds
+ - RX_RING, TX_RING available
+ - How to switch to TPACKET_V2:
+ 1. Replace struct tpacket_hdr by struct tpacket2_hdr
+ 2. Query header len and save
+ 3. Set protocol version to 2, set up ring as usual
+ 4. For getting the sockaddr_ll,
+ use (void *)hdr + TPACKET_ALIGN(hdrlen) instead of
+ (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
+
+TPACKET_V2 --> TPACKET_V3:
+ - Flexible buffer implementation:
+ 1. Blocks can be configured with non-static frame-size
+ 2. Read/poll is at a block-level (as opposed to packet-level)
+ 3. Added poll timeout to avoid indefinite user-space wait
+ on idle links
+ 4. Added user-configurable knobs:
+ 4.1 block::timeout
+ 4.2 tpkt_hdr::sk_rxhash
+ - RX Hash data available in user space
+ - Currently only RX_RING available
+
+-------------------------------------------------------------------------------
++ AF_PACKET fanout mode
+-------------------------------------------------------------------------------
+
+In the AF_PACKET fanout mode, packet reception can be load balanced among
+processes. This also works in combination with mmap(2) on packet sockets.
+
+Minimal example code by David S. Miller (try things like "./test eth0 hash",
+"./test eth0 lb", etc.):
+
+#include <stddef.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <unistd.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+
+#include <net/if.h>
+
+static const char *device_name;
+static int fanout_type;
+static int fanout_id;
+
+#ifndef PACKET_FANOUT
+# define PACKET_FANOUT 18
+# define PACKET_FANOUT_HASH 0
+# define PACKET_FANOUT_LB 1
+#endif
+
+static int setup_socket(void)
+{
+ int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
+ struct sockaddr_ll ll;
+ struct ifreq ifr;
+ int fanout_arg;
+
+ if (fd < 0) {
+ perror("socket");
+ return EXIT_FAILURE;
+ }
+
+ memset(&ifr, 0, sizeof(ifr));
+ strcpy(ifr.ifr_name, device_name);
+ err = ioctl(fd, SIOCGIFINDEX, &ifr);
+ if (err < 0) {
+ perror("SIOCGIFINDEX");
+ return EXIT_FAILURE;
+ }
+
+ memset(&ll, 0, sizeof(ll));
+ ll.sll_family = AF_PACKET;
+ ll.sll_ifindex = ifr.ifr_ifindex;
+ err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
+ if (err < 0) {
+ perror("bind");
+ return EXIT_FAILURE;
+ }
+
+ fanout_arg = (fanout_id | (fanout_type << 16));
+ err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
+ &fanout_arg, sizeof(fanout_arg));
+ if (err) {
+ perror("setsockopt");
+ return EXIT_FAILURE;
+ }
+
+ return fd;
+}
+
+static void fanout_thread(void)
+{
+ int fd = setup_socket();
+ int limit = 10000;
+
+ if (fd < 0)
+ exit(fd);
+
+ while (limit-- > 0) {
+ char buf[1600];
+ int err;
+
+ err = read(fd, buf, sizeof(buf));
+ if (err < 0) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ if ((limit % 10) == 0)
+ fprintf(stdout, "(%d) \n", getpid());
+ }
+
+ fprintf(stdout, "%d: Received 10000 packets\n", getpid());
+
+ close(fd);
+ exit(0);
+}
+
+int main(int argc, char **argp)
+{
+ int fd, err;
+ int i;
+
+ if (argc != 3) {
+ fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
+ return EXIT_FAILURE;
+ }
+
+ if (!strcmp(argp[2], "hash"))
+ fanout_type = PACKET_FANOUT_HASH;
+ else if (!strcmp(argp[2], "lb"))
+ fanout_type = PACKET_FANOUT_LB;
+ else {
+ fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
+ exit(EXIT_FAILURE);
+ }
+
+ device_name = argp[1];
+ fanout_id = getpid() & 0xffff;
+
+ for (i = 0; i < 4; i++) {
+ pid_t pid = fork();
+
+ switch (pid) {
+ case 0:
+ fanout_thread();
+
+ case -1:
+ perror("fork");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < 4; i++) {
+ int status;
+
+ wait(&status);
+ }
+
+ return 0;
+}
+
+-------------------------------------------------------------------------------
+ PACKET_TIMESTAMP
-------------------------------------------------------------------------------
@@ -519,6 +710,13 @@ the networking stack is used (the behavior before this setting was added).
See include/linux/net_tstamp.h and Documentation/networking/timestamping
for more information on hardware timestamps.
+-------------------------------------------------------------------------------
++ Miscellaneous bits
+-------------------------------------------------------------------------------
+
+- Packet sockets work well together with Linux socket filters, thus you also
+ might want to have a look at Documentation/networking/filter.txt
+
--------------------------------------------------------------------------------
+ THANKS
--------------------------------------------------------------------------------
diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt
index 4be0c039edb..d2a9f43b554 100644
--- a/Documentation/networking/s2io.txt
+++ b/Documentation/networking/s2io.txt
@@ -136,16 +136,6 @@ For more information, please review the AMD8131 errata at
http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
-6. Available Downloads
-Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up
-to date, also the latest "s2io" code (including support for 2.4 kernels) is
-available via "Support" link on the Neterion site: http://www.neterion.com.
-
-For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com,
-user: linuxdocs password: HALdocs
-
-7. Support
+6. Support
For further support please contact either your 10GbE Xframe NIC vendor (IBM,
-HP, SGI etc.) or click on the "Support" link on the Neterion site:
-http://www.neterion.com.
-
+HP, SGI etc.)
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 5cb9a197246..f9fa6db40a5 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -29,11 +29,9 @@ The kernel configuration option is STMMAC_ETH:
dma_txsize: DMA tx ring size;
buf_sz: DMA buffer size;
tc: control the HW FIFO threshold;
- tx_coe: Enable/Disable Tx Checksum Offload engine;
watchdog: transmit timeout (in milliseconds);
flow_ctrl: Flow control ability [on/off];
pause: Flow Control Pause Time;
- tmrate: timer period (only if timer optimisation is configured).
3) Command line options
Driver parameters can be also passed in command line by using:
@@ -60,17 +58,19 @@ Then the poll method will be scheduled at some future point.
The incoming packets are stored, by the DMA, in a list of pre-allocated socket
buffers in order to avoid the memcpy (Zero-copy).
-4.3) Timer-Driver Interrupt
-Instead of having the device that asynchronously notifies the frame receptions,
-the driver configures a timer to generate an interrupt at regular intervals.
-Based on the granularity of the timer, the frames that are received by the
-device will experience different levels of latency. Some NICs have dedicated
-timer device to perform this task. STMMAC can use either the RTC device or the
-TMU channel 2 on STLinux platforms.
-The timers frequency can be passed to the driver as parameter; when change it,
-take care of both hardware capability and network stability/performance impact.
-Several performance tests on STM platforms showed this optimisation allows to
-spare the CPU while having the maximum throughput.
+4.3) Interrupt Mitigation
+The driver is able to mitigate the number of its DMA interrupts
+using NAPI for the reception on chips older than the 3.50.
+New chips have an HW RX-Watchdog used for this mitigation.
+
+On Tx-side, the mitigation schema is based on a SW timer that calls the
+tx function (stmmac_tx) to reclaim the resource after transmitting the
+frames.
+Also there is another parameter (like a threshold) used to program
+the descriptors avoiding to set the interrupt on completion bit in
+when the frame is sent (xmit).
+
+Mitigation parameters can be tuned by ethtool.
4.4) WOL
Wake up on Lan feature through Magic and Unicast frames are supported for the
@@ -121,6 +121,7 @@ struct plat_stmmacenet_data {
int bugged_jumbo;
int pmt;
int force_sf_dma_mode;
+ int riwt_off;
void (*fix_mac_speed)(void *priv, unsigned int speed);
void (*bus_setup)(void __iomem *ioaddr);
int (*init)(struct platform_device *pdev);
@@ -156,6 +157,7 @@ Where:
o pmt: core has the embedded power module (optional).
o force_sf_dma_mode: force DMA to use the Store and Forward mode
instead of the Threshold.
+ o riwt_off: force to disable the RX watchdog feature and switch to NAPI mode.
o fix_mac_speed: this callback is used for modifying some syscfg registers
(on ST SoCs) according to the link speed negotiated by the
physical layer .
@@ -173,7 +175,6 @@ Where:
For MDIO bus The we have:
struct stmmac_mdio_bus_data {
- int bus_id;
int (*phy_reset)(void *priv);
unsigned int phy_mask;
int *irqs;
@@ -181,7 +182,6 @@ For MDIO bus The we have:
};
Where:
- o bus_id: bus identifier;
o phy_reset: hook to reset the phy device attached to the bus.
o phy_mask: phy mask passed when register the MDIO bus within the driver.
o irqs: list of IRQs, one per PHY.
@@ -230,9 +230,6 @@ there are two MAC cores: one MAC is for MDIO Bus/PHY emulation
with fixed_link support.
static struct stmmac_mdio_bus_data stmmac1_mdio_bus = {
- .bus_id = 1,
- |
- |-> phy device on the bus_id 1
.phy_reset = phy_reset;
|
|-> function to provide the phy_reset on this board
@@ -257,9 +254,11 @@ reset procedure etc).
o Makefile
o stmmac_main.c: main network device driver;
o stmmac_mdio.c: mdio functions;
+ o stmmac_pci: PCI driver;
+ o stmmac_platform.c: platform driver
o stmmac_ethtool.c: ethtool support;
o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
- Only tested on ST40 platforms based.
+ (only tested on ST40 platforms based);
o stmmac.h: private driver structure;
o common.h: common definitions and VFTs;
o descs.h: descriptor structure definitions;
@@ -269,9 +268,11 @@ reset procedure etc).
o dwmac100_core: MAC 100 core and dma code;
o dwmac100_dma.c: dma funtions for the MAC chip;
o dwmac1000.h: specific header file for the MAC;
- o dwmac_lib.c: generic DMA functions shared among chips
- o enh_desc.c: functions for handling enhanced descriptors
- o norm_desc.c: functions for handling normal descriptors
+ o dwmac_lib.c: generic DMA functions shared among chips;
+ o enh_desc.c: functions for handling enhanced descriptors;
+ o norm_desc.c: functions for handling normal descriptors;
+ o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes;
+ o mmc_core.c/mmc.h: Management MAC Counters;
5) Debug Information
@@ -304,7 +305,27 @@ All these are only useful during the developing stage
and should never enabled inside the code for general usage.
In fact, these can generate an huge amount of debug messages.
-6) TODO:
+6) Energy Efficient Ethernet
+
+Energy Efficient Ethernet(EEE) enables IEEE 802.3 MAC sublayer along
+with a family of Physical layer to operate in the Low power Idle(LPI)
+mode. The EEE mode supports the IEEE 802.3 MAC operation at 100Mbps,
+1000Mbps & 10Gbps.
+
+The LPI mode allows power saving by switching off parts of the
+communication device functionality when there is no data to be
+transmitted & received. The system on both the side of the link can
+disable some functionalities & save power during the period of low-link
+utilization. The MAC controls whether the system should enter or exit
+the LPI mode & communicate this to PHY.
+
+As soon as the interface is opened, the driver verifies if the EEE can
+be supported. This is done by looking at both the DMA HW capability
+register and the PHY devices MCD registers.
+To enter in Tx LPI mode the driver needs to have a software timer
+that enable and disable the LPI mode when there is nothing to be
+transmitted.
+
+7) TODO:
o XGMAC is not supported.
- o Add the EEE - Energy Efficient Ethernet
o Add the PTP - precision time protocol
diff --git a/Documentation/networking/vxge.txt b/Documentation/networking/vxge.txt
index d2e2997e6fa..bb76c667a47 100644
--- a/Documentation/networking/vxge.txt
+++ b/Documentation/networking/vxge.txt
@@ -91,10 +91,3 @@ v) addr_learn_en
virtualization environment.
Valid range: 0,1 (disabled, enabled respectively)
Default: 0
-
-4) Troubleshooting:
--------------------
-
-To resolve an issue with the source code or X3100 series adapter, please collect
-the statistics, register dumps using ethool, relevant logs and email them to
-support@neterion.com.
diff --git a/Documentation/networking/vxlan.txt b/Documentation/networking/vxlan.txt
new file mode 100644
index 00000000000..6d993510f09
--- /dev/null
+++ b/Documentation/networking/vxlan.txt
@@ -0,0 +1,47 @@
+Virtual eXtensible Local Area Networking documentation
+======================================================
+
+The VXLAN protocol is a tunnelling protocol that is designed to
+solve the problem of limited number of available VLAN's (4096).
+With VXLAN identifier is expanded to 24 bits.
+
+It is a draft RFC standard, that is implemented by Cisco Nexus,
+Vmware and Brocade. The protocol runs over UDP using a single
+destination port (still not standardized by IANA).
+This document describes the Linux kernel tunnel device,
+there is also an implantation of VXLAN for Openvswitch.
+
+Unlike most tunnels, a VXLAN is a 1 to N network, not just point
+to point. A VXLAN device can either dynamically learn the IP address
+of the other end, in a manner similar to a learning bridge, or the
+forwarding entries can be configured statically.
+
+The management of vxlan is done in a similar fashion to it's
+too closest neighbors GRE and VLAN. Configuring VXLAN requires
+the version of iproute2 that matches the kernel release
+where VXLAN was first merged upstream.
+
+1. Create vxlan device
+ # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1
+
+This creates a new device (vxlan0). The device uses the
+the multicast group 239.1.1.1 over eth1 to handle packets where
+no entry is in the forwarding table.
+
+2. Delete vxlan device
+ # ip link delete vxlan0
+
+3. Show vxlan info
+ # ip -d link show vxlan0
+
+It is possible to create, destroy and display the vxlan
+forwarding table using the new bridge command.
+
+1. Create forwarding table entry
+ # bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0
+
+2. Delete forwarding table entry
+ # bridge fdb delete 00:17:42:8a:b4:05 dev vxlan0
+
+3. Show forwarding table
+ # bridge fdb show dev vxlan0