From d6e640f9766e2fb9aa3853b4ff19e4d7d5d7e373 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Tue, 8 May 2012 22:20:33 +0200 Subject: can: update documentation wording error frames -> error messages As Heinz-Juergen Oertel pointed out 'CAN error frames' are a already defined term for the CAN protocol violation indication on the wire. To avoid confusion with the error messages created by CAN drivers available via CAN RAW sockets update the documentation and change the naming from 'error frames' to 'error messages' or 'error message frames'. Signed-off-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- Documentation/networking/can.txt | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 56ca3b75376..28d9b14c34e 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -232,16 +232,16 @@ solution for a couple of reasons: arbitration problems and error frames caused by the different ECUs. The occurrence of detected errors are important for diagnosis and have to be logged together with the exact timestamp. For this - reason the CAN interface driver can generate so called Error Frames - that can optionally be passed to the user application in the same - way as other CAN frames. Whenever an error on the physical layer + reason the CAN interface driver can generate so called Error Message + Frames that can optionally be passed to the user application in the + same way as other CAN frames. Whenever an error on the physical layer or the MAC layer is detected (e.g. by the CAN controller) the driver - creates an appropriate error frame. Error frames can be requested by - the user application using the common CAN filter mechanisms. Inside - this filter definition the (interested) type of errors may be - selected. The reception of error frames is disabled by default. - The format of the CAN error frame is briefly described in the Linux - header file "include/linux/can/error.h". + creates an appropriate error message frame. Error messages frames can + be requested by the user application using the common CAN filter + mechanisms. Inside this filter definition the (interested) type of + errors may be selected. The reception of error messages is disabled + by default. The format of the CAN error message frame is briefly + described in the Linux header file "include/linux/can/error.h". 4. How to use Socket CAN ------------------------ @@ -383,7 +383,7 @@ solution for a couple of reasons: defaults are set at RAW socket binding time: - The filters are set to exactly one filter receiving everything - - The socket only receives valid data frames (=> no error frames) + - The socket only receives valid data frames (=> no error message frames) - The loopback of sent CAN frames is enabled (see chapter 3.2) - The socket does not receive its own sent frames (in loopback mode) @@ -434,7 +434,7 @@ solution for a couple of reasons: 4.1.2 RAW socket option CAN_RAW_ERR_FILTER As described in chapter 3.4 the CAN interface driver can generate so - called Error Frames that can optionally be passed to the user + called Error Message Frames that can optionally be passed to the user application in the same way as other CAN frames. The possible errors are divided into different error classes that may be filtered using the appropriate error mask. To register for every possible @@ -527,7 +527,7 @@ solution for a couple of reasons: rcvlist_all - list for unfiltered entries (no filter operations) rcvlist_eff - list for single extended frame (EFF) entries - rcvlist_err - list for error frames masks + rcvlist_err - list for error message frames masks rcvlist_fil - list for mask/value filters rcvlist_inv - list for mask/value filters (inverse semantic) rcvlist_sff - list for single standard frame (SFF) entries @@ -784,13 +784,13 @@ solution for a couple of reasons: $ ip link set canX type can restart-ms 100 Alternatively, the application may realize the "bus-off" condition - by monitoring CAN error frames and do a restart when appropriate with - the command: + by monitoring CAN error message frames and do a restart when + appropriate with the command: $ ip link set canX type can restart - Note that a restart will also create a CAN error frame (see also - chapter 3.4). + Note that a restart will also create a CAN error message frame (see + also chapter 3.4). 6.6 Supported CAN hardware -- cgit v1.2.3 From d0daebc3d622f95db181601cb0c4a0781f74f758 Mon Sep 17 00:00:00 2001 From: Thomas Graf Date: Tue, 12 Jun 2012 00:44:01 +0000 Subject: ipv4: Add interface option to enable routing of 127.0.0.0/8 Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf Acked-by: Neil Horman Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6f896b94abd..99d0e0504d6 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -862,6 +862,11 @@ accept_local - BOOLEAN local interfaces over the wire and have them accepted properly. default FALSE +route_localnet - BOOLEAN + Do not consider loopback addresses as martian source or destination + while routing. This enables the use of 127/8 for local routing purposes. + default FALSE + rp_filter - INTEGER 0 - No source validation. 1 - Strict mode as defined in RFC3704 Strict Reverse Path -- cgit v1.2.3 From f8214865a55f805e65c33350bc0f1eb46dd8433d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Hundeb=C3=B8ll?= Date: Fri, 20 Apr 2012 17:02:45 +0200 Subject: batman-adv: Add get_ethtool_stats() support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added additional counters in a bat_stats structure, which are exported through the ethtool api. The counters are specific to batman-adv and includes: forwarded packets and bytes management packets and bytes (aggregated OGMs at this point) translation table packets New counters are added by extending "enum bat_counters" in types.h and adding corresponding descriptive string(s) to bat_counters_strings in soft-iface.c. Counters are increased by calling batadv_add_counter() and incremented by one by calling batadv_inc_counter(). Signed-off-by: Martin Hundebøll Signed-off-by: Sven Eckelmann --- Documentation/networking/batman-adv.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index 75a592365af..8f3ae4a6147 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -211,6 +211,11 @@ The debug output can be changed at runtime using the file will enable debug messages for when routes change. +Counters for different types of packets entering and leaving the +batman-adv module are available through ethtool: + +# ethtool --statistics bat0 + BATCTL ------ -- cgit v1.2.3 From ea53fe0c667ad3cae61d4d71d2be41908ac5c0a4 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Sat, 16 Jun 2012 12:01:58 +0200 Subject: canfd: update documentation according to CAN FD extensions Signed-off-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- Documentation/networking/can.txt | 154 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 8 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index a06741898f2..820f55344ed 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -22,7 +22,8 @@ This file contains 4.1.2 RAW socket option CAN_RAW_ERR_FILTER 4.1.3 RAW socket option CAN_RAW_LOOPBACK 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS - 4.1.5 RAW socket returned message flags + 4.1.5 RAW socket option CAN_RAW_FD_FRAMES + 4.1.6 RAW socket returned message flags 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) 4.3 connected transport protocols (SOCK_SEQPACKET) 4.4 unconnected transport protocols (SOCK_DGRAM) @@ -41,7 +42,8 @@ This file contains 6.5.1 Netlink interface to set/get devices properties 6.5.2 Setting the CAN bit-timing 6.5.3 Starting and stopping the CAN network device - 6.6 supported CAN hardware + 6.6 CAN FD (flexible data rate) driver support + 6.7 supported CAN hardware 7 Socket CAN resources @@ -273,7 +275,7 @@ solution for a couple of reasons: struct can_frame { canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ - __u8 can_dlc; /* data length code: 0 .. 8 */ + __u8 can_dlc; /* frame payload length in byte (0 .. 8) */ __u8 data[8] __attribute__((aligned(8))); }; @@ -375,6 +377,51 @@ solution for a couple of reasons: nbytes = sendto(s, &frame, sizeof(struct can_frame), 0, (struct sockaddr*)&addr, sizeof(addr)); + Remark about CAN FD (flexible data rate) support: + + Generally the handling of CAN FD is very similar to the formerly described + examples. The new CAN FD capable CAN controllers support two different + bitrates for the arbitration phase and the payload phase of the CAN FD frame + and up to 64 bytes of payload. This extended payload length breaks all the + kernel interfaces (ABI) which heavily rely on the CAN frame with fixed eight + bytes of payload (struct can_frame) like the CAN_RAW socket. Therefore e.g. + the CAN_RAW socket supports a new socket option CAN_RAW_FD_FRAMES that + switches the socket into a mode that allows the handling of CAN FD frames + and (legacy) CAN frames simultaneously (see section 4.1.5). + + The struct canfd_frame is defined in include/linux/can.h: + + struct canfd_frame { + canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ + __u8 len; /* frame payload length in byte (0 .. 64) */ + __u8 flags; /* additional flags for CAN FD */ + __u8 __res0; /* reserved / padding */ + __u8 __res1; /* reserved / padding */ + __u8 data[64] __attribute__((aligned(8))); + }; + + The struct canfd_frame and the existing struct can_frame have the can_id, + the payload length and the payload data at the same offset inside their + structures. This allows to handle the different structures very similar. + When the content of a struct can_frame is copied into a struct canfd_frame + all structure elements can be used as-is - only the data[] becomes extended. + + When introducing the struct canfd_frame it turned out that the data length + code (DLC) of the struct can_frame was used as a length information as the + length and the DLC has a 1:1 mapping in the range of 0 .. 8. To preserve + the easy handling of the length information the canfd_frame.len element + contains a plain length value from 0 .. 64. So both canfd_frame.len and + can_frame.can_dlc are equal and contain a length information and no DLC. + For details about the distinction of CAN and CAN FD capable devices and + the mapping to the bus-relevant data length code (DLC), see chapter 6.6. + + The length of the two CAN(FD) frame structures define the maximum transfer + unit (MTU) of the CAN(FD) network interface and skbuff data length. Two + definitions are specified for CAN specific MTUs in include/linux/can.h : + + #define CAN_MTU (sizeof(struct can_frame)) == 16 => 'legacy' CAN frame + #define CANFD_MTU (sizeof(struct canfd_frame)) == 72 => CAN FD frame + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) Using CAN_RAW sockets is extensively comparable to the commonly @@ -472,7 +519,69 @@ solution for a couple of reasons: setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, &recv_own_msgs, sizeof(recv_own_msgs)); - 4.1.5 RAW socket returned message flags + 4.1.5 RAW socket option CAN_RAW_FD_FRAMES + + CAN FD support in CAN_RAW sockets can be enabled with a new socket option + CAN_RAW_FD_FRAMES which is off by default. When the new socket option is + not supported by the CAN_RAW socket (e.g. on older kernels), switching the + CAN_RAW_FD_FRAMES option returns the error -ENOPROTOOPT. + + Once CAN_RAW_FD_FRAMES is enabled the application can send both CAN frames + and CAN FD frames. OTOH the application has to handle CAN and CAN FD frames + when reading from the socket. + + CAN_RAW_FD_FRAMES enabled: CAN_MTU and CANFD_MTU are allowed + CAN_RAW_FD_FRAMES disabled: only CAN_MTU is allowed (default) + + Example: + [ remember: CANFD_MTU == sizeof(struct canfd_frame) ] + + struct canfd_frame cfd; + + nbytes = read(s, &cfd, CANFD_MTU); + + if (nbytes == CANFD_MTU) { + printf("got CAN FD frame with length %d\n", cfd.len); + /* cfd.flags contains valid data */ + } else if (nbytes == CAN_MTU) { + printf("got legacy CAN frame with length %d\n", cfd.len); + /* cfd.flags is undefined */ + } else { + fprintf(stderr, "read: invalid CAN(FD) frame\n"); + return 1; + } + + /* the content can be handled independently from the received MTU size */ + + printf("can_id: %X data length: %d data: ", cfd.can_id, cfd.len); + for (i = 0; i < cfd.len; i++) + printf("%02X ", cfd.data[i]); + + When reading with size CANFD_MTU only returns CAN_MTU bytes that have + been received from the socket a legacy CAN frame has been read into the + provided CAN FD structure. Note that the canfd_frame.flags data field is + not specified in the struct can_frame and therefore it is only valid in + CANFD_MTU sized CAN FD frames. + + As long as the payload length is <=8 the received CAN frames from CAN FD + capable CAN devices can be received and read by legacy sockets too. When + user-generated CAN FD frames have a payload length <=8 these can be send + by legacy CAN network interfaces too. Sending CAN FD frames with payload + length > 8 to a legacy CAN network interface returns an -EMSGSIZE error. + + Implementation hint for new CAN applications: + + To build a CAN FD aware application use struct canfd_frame as basic CAN + data structure for CAN_RAW based applications. When the application is + executed on an older Linux kernel and switching the CAN_RAW_FD_FRAMES + socket option returns an error: No problem. You'll get legacy CAN frames + or CAN FD frames and can process them the same way. + + When sending to CAN devices make sure that the device is capable to handle + CAN FD frames by checking if the device maximum transfer unit is CANFD_MTU. + The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall. + + 4.1.6 RAW socket returned message flags When using recvmsg() call, the msg->msg_flags may contain following flags: @@ -573,10 +682,13 @@ solution for a couple of reasons: dev->type = ARPHRD_CAN; /* the netdevice hardware type */ dev->flags = IFF_NOARP; /* CAN has no arp */ - dev->mtu = sizeof(struct can_frame); + dev->mtu = CAN_MTU; /* sizeof(struct can_frame) -> legacy CAN interface */ - The struct can_frame is the payload of each socket buffer in the - protocol family PF_CAN. + or alternative, when the controller supports CAN with flexible data rate: + dev->mtu = CANFD_MTU; /* sizeof(struct canfd_frame) -> CAN FD interface */ + + The struct can_frame or struct canfd_frame is the payload of each socket + buffer (skbuff) in the protocol family PF_CAN. 6.2 local loopback of sent frames @@ -792,7 +904,33 @@ solution for a couple of reasons: Note that a restart will also create a CAN error message frame (see also chapter 3.4). - 6.6 Supported CAN hardware + 6.6 CAN FD (flexible data rate) driver support + + CAN FD capable CAN controllers support two different bitrates for the + arbitration phase and the payload phase of the CAN FD frame. Therefore a + second bittiming has to be specified in order to enable the CAN FD bitrate. + + Additionally CAN FD capable CAN controllers support up to 64 bytes of + payload. The representation of this length in can_frame.can_dlc and + canfd_frame.len for userspace applications and inside the Linux network + layer is a plain value from 0 .. 64 instead of the CAN 'data length code'. + The data length code was a 1:1 mapping to the payload length in the legacy + CAN frames anyway. The payload length to the bus-relevant DLC mapping is + only performed inside the CAN drivers, preferably with the helper + functions can_dlc2len() and can_len2dlc(). + + The CAN netdevice driver capabilities can be distinguished by the network + devices maximum transfer unit (MTU): + + MTU = 16 (CAN_MTU) => sizeof(struct can_frame) => 'legacy' CAN device + MTU = 72 (CANFD_MTU) => sizeof(struct canfd_frame) => CAN FD capable device + + The CAN device MTU can be retrieved e.g. with a SIOCGIFMTU ioctl() syscall. + N.B. CAN FD capable devices can also handle and send legacy CAN frames. + + FIXME: Add details about the CAN FD controller configuration when available. + + 6.7 Supported CAN hardware Please check the "Kconfig" file in "drivers/net/can" to get an actual list of the support CAN hardware. On the Socket CAN project website -- cgit v1.2.3 From 5051c94bb3998ff24bf07ae3b72dca30f85962f8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sjur=20Br=C3=A6ndeland?= Date: Mon, 25 Jun 2012 07:49:40 +0000 Subject: Documentation/networking/caif: Update documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update drawing and remove description of old features. Add HSI and USB link layers to the drawing. Reported-by: Joerg Reisenweber Signed-off-by: Sjur Brændeland Signed-off-by: David S. Miller --- Documentation/networking/caif/Linux-CAIF.txt | 91 +++++++++------------------- 1 file changed, 27 insertions(+), 64 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/caif/Linux-CAIF.txt b/Documentation/networking/caif/Linux-CAIF.txt index e52fd62bef3..0aa4bd381be 100644 --- a/Documentation/networking/caif/Linux-CAIF.txt +++ b/Documentation/networking/caif/Linux-CAIF.txt @@ -19,60 +19,36 @@ and host. Currently, UART and Loopback are available for Linux. Architecture: ------------ The implementation of CAIF is divided into: -* CAIF Socket Layer, Kernel API, and Net Device. +* CAIF Socket Layer and GPRS IP Interface. * CAIF Core Protocol Implementation * CAIF Link Layer, implemented as NET devices. RTNL ! - ! +------+ +------+ +------+ - ! +------+! +------+! +------+! - ! ! Sock !! !Kernel!! ! Net !! - ! ! API !+ ! API !+ ! Dev !+ <- CAIF Client APIs - ! +------+ +------! +------+ - ! ! ! ! - ! +----------!----------+ - ! +------+ <- CAIF Protocol Implementation - +-------> ! CAIF ! - ! Core ! - +------+ - +--------!--------+ - ! ! - +------+ +-----+ - ! ! ! TTY ! <- Link Layer (Net Devices) - +------+ +-----+ - - -Using the Kernel API ----------------------- -The Kernel API is used for accessing CAIF channels from the -kernel. -The user of the API has to implement two callbacks for receive -and control. -The receive callback gives a CAIF packet as a SKB. The control -callback will -notify of channel initialization complete, and flow-on/flow- -off. - - - struct caif_device caif_dev = { - .caif_config = { - .name = "MYDEV" - .type = CAIF_CHTY_AT - } - .receive_cb = my_receive, - .control_cb = my_control, - }; - caif_add_device(&caif_dev); - caif_transmit(&caif_dev, skb); - -See the caif_kernel.h for details about the CAIF kernel API. + ! +------+ +------+ + ! +------+! +------+! + ! ! IP !! !Socket!! + +-------> !interf!+ ! API !+ <- CAIF Client APIs + ! +------+ +------! + ! ! ! + ! +-----------+ + ! ! + ! +------+ <- CAIF Core Protocol + ! ! CAIF ! + ! ! Core ! + ! +------+ + ! +----------!---------+ + ! ! ! ! + ! +------+ +-----+ +------+ + +--> ! HSI ! ! TTY ! ! USB ! <- Link Layer (Net Devices) + +------+ +-----+ +------+ + I M P L E M E N T A T I O N =========================== -=========================== + CAIF Core Protocol Layer ========================================= @@ -88,17 +64,13 @@ The Core CAIF implementation contains: - Simple implementation of CAIF. - Layered architecture (a la Streams), each layer in the CAIF specification is implemented in a separate c-file. - - Clients must implement PHY layer to access physical HW - with receive and transmit functions. - Clients must call configuration function to add PHY layer. - Clients must implement CAIF layer to consume/produce CAIF payload with receive and transmit functions. - Clients must call configuration function to add and connect the Client layer. - When receiving / transmitting CAIF Packets (cfpkt), ownership is passed - to the called function (except for framing layers' receive functions - or if a transmit function returns an error, in which case the caller - must free the packet). + to the called function (except for framing layers' receive function) Layered Architecture -------------------- @@ -109,11 +81,6 @@ Implementation. The support functions include: CAIF Packet has functions for creating, destroying and adding content and for adding/extracting header and trailers to protocol packets. - - CFLST CAIF list implementation. - - - CFGLUE CAIF Glue. Contains OS Specifics, such as memory - allocation, endianness, etc. - The CAIF Protocol implementation contains: - CFCNFG CAIF Configuration layer. Configures the CAIF Protocol @@ -128,7 +95,7 @@ The CAIF Protocol implementation contains: control and remote shutdown requests. - CFVEI CAIF VEI layer. Handles CAIF AT Channels on VEI (Virtual - External Interface). This layer encodes/decodes VEI frames. + External Interface). This layer encodes/decodes VEI frames. - CFDGML CAIF Datagram layer. Handles CAIF Datagram layer (IP traffic), encodes/decodes Datagram frames. @@ -170,7 +137,7 @@ The CAIF Protocol implementation contains: +---------+ +---------+ ! ! +---------+ +---------+ - | | | Serial | + | | | Serial | | | | CFSERL | +---------+ +---------+ @@ -186,24 +153,20 @@ In this layered approach the following "rules" apply. layer->dn->transmit(layer->dn, packet); -Linux Driver Implementation +CAIF Socket and IP interface =========================== -Linux GPRS Net Device and CAIF socket are implemented on top of the -CAIF Core protocol. The Net device and CAIF socket have an instance of +The IP interface and CAIF socket API are implemented on top of the +CAIF Core protocol. The IP Interface and CAIF socket have an instance of 'struct cflayer', just like the CAIF Core protocol stack. Net device and Socket implement the 'receive()' function defined by 'struct cflayer', just like the rest of the CAIF stack. In this way, transmit and receive of packets is handled as by the rest of the layers: the 'dn->transmit()' function is called in order to transmit data. -The layer on top of the CAIF Core implementation is -sometimes referred to as the "Client layer". - - Configuration of Link Layer --------------------------- -The Link Layer is implemented as Linux net devices (struct net_device). +The Link Layer is implemented as Linux network devices (struct net_device). Payload handling and registration is done using standard Linux mechanisms. The CAIF Protocol relies on a loss-less link layer without implementing -- cgit v1.2.3 From c801e3cc1925e02fa7213889306d4d77e6ad1550 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 30 Jun 2012 22:39:27 -0700 Subject: ipv4: Clarify in docs that accept_local requires rp_filter. Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 99d0e0504d6..47b6c79e9b0 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -857,9 +857,14 @@ accept_source_route - BOOLEAN FALSE (host) accept_local - BOOLEAN - Accept packets with local source addresses. In combination with - suitable routing, this can be used to direct packets between two - local interfaces over the wire and have them accepted properly. + Accept packets with local source addresses. In combination + with suitable routing, this can be used to direct packets + between two local interfaces over the wire and have them + accepted properly. + + rp_filter must be set to a non-zero value in order for + accept_local to have an effect. + default FALSE route_localnet - BOOLEAN -- cgit v1.2.3 From 0ec2ccd0804ebb57a860c59d056a3f420c4f8028 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Wed, 27 Jun 2012 21:14:36 +0000 Subject: stmmac: update the driver Documentation and add EEE This patch updates the stmmac's documentation adding some missing files in the section used to describe the internal driver's structure. Also the patch adds a new section to describe the EEE support. Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 5cb9a197246..c676b9cedbd 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -257,9 +257,11 @@ reset procedure etc). o Makefile o stmmac_main.c: main network device driver; o stmmac_mdio.c: mdio functions; + o stmmac_pci: PCI driver; + o stmmac_platform.c: platform driver o stmmac_ethtool.c: ethtool support; o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts - Only tested on ST40 platforms based. + (only tested on ST40 platforms based); o stmmac.h: private driver structure; o common.h: common definitions and VFTs; o descs.h: descriptor structure definitions; @@ -269,9 +271,11 @@ reset procedure etc). o dwmac100_core: MAC 100 core and dma code; o dwmac100_dma.c: dma funtions for the MAC chip; o dwmac1000.h: specific header file for the MAC; - o dwmac_lib.c: generic DMA functions shared among chips - o enh_desc.c: functions for handling enhanced descriptors - o norm_desc.c: functions for handling normal descriptors + o dwmac_lib.c: generic DMA functions shared among chips; + o enh_desc.c: functions for handling enhanced descriptors; + o norm_desc.c: functions for handling normal descriptors; + o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes; + o mmc_core.c/mmc.h: Management MAC Counters; 5) Debug Information @@ -304,7 +308,27 @@ All these are only useful during the developing stage and should never enabled inside the code for general usage. In fact, these can generate an huge amount of debug messages. -6) TODO: +6) Energy Efficient Ethernet + +Energy Efficient Ethernet(EEE) enables IEEE 802.3 MAC sublayer along +with a family of Physical layer to operate in the Low power Idle(LPI) +mode. The EEE mode supports the IEEE 802.3 MAC operation at 100Mbps, +1000Mbps & 10Gbps. + +The LPI mode allows power saving by switching off parts of the +communication device functionality when there is no data to be +transmitted & received. The system on both the side of the link can +disable some functionalities & save power during the period of low-link +utilization. The MAC controls whether the system should enter or exit +the LPI mode & communicate this to PHY. + +As soon as the interface is opened, the driver verifies if the EEE can +be supported. This is done by looking at both the DMA HW capability +register and the PHY devices MCD registers. +To enter in Tx LPI mode the driver needs to have a software timer +that enable and disable the LPI mode when there is nothing to be +transmitted. + +7) TODO: o XGMAC is not supported. - o Add the EEE - Energy Efficient Ethernet o Add the PTP - precision time protocol -- cgit v1.2.3 From c0589fa78ae534acb741370872c4e13578d2f164 Mon Sep 17 00:00:00 2001 From: Jon Mason Date: Mon, 9 Jul 2012 14:07:57 +0000 Subject: vxge/s2io: remove dead URLs URLs to neterion.com and s2io.com no longer resolve. Remove all references to these URLs in the driver source and documentation. Signed-off-by: Jon Mason Signed-off-by: David S. Miller --- Documentation/networking/s2io.txt | 14 ++------------ Documentation/networking/vxge.txt | 7 ------- 2 files changed, 2 insertions(+), 19 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt index 4be0c039edb..d2a9f43b554 100644 --- a/Documentation/networking/s2io.txt +++ b/Documentation/networking/s2io.txt @@ -136,16 +136,6 @@ For more information, please review the AMD8131 errata at http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/ 26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf -6. Available Downloads -Neterion "s2io" driver in Red Hat and Suse 2.6-based distributions is kept up -to date, also the latest "s2io" code (including support for 2.4 kernels) is -available via "Support" link on the Neterion site: http://www.neterion.com. - -For Xframe User Guide (Programming manual), visit ftp site ns1.s2io.com, -user: linuxdocs password: HALdocs - -7. Support +6. Support For further support please contact either your 10GbE Xframe NIC vendor (IBM, -HP, SGI etc.) or click on the "Support" link on the Neterion site: -http://www.neterion.com. - +HP, SGI etc.) diff --git a/Documentation/networking/vxge.txt b/Documentation/networking/vxge.txt index d2e2997e6fa..bb76c667a47 100644 --- a/Documentation/networking/vxge.txt +++ b/Documentation/networking/vxge.txt @@ -91,10 +91,3 @@ v) addr_learn_en virtualization environment. Valid range: 0,1 (disabled, enabled respectively) Default: 0 - -4) Troubleshooting: -------------------- - -To resolve an issue with the source code or X3100 series adapter, please collect -the statistics, register dumps using ethool, relevant logs and email them to -support@neterion.com. -- cgit v1.2.3 From 46d3ceabd8d98ed0ad10f20c595ca784e34786c5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 11 Jul 2012 05:50:31 +0000 Subject: tcp: TCP Small Queues This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet Cc: Dave Taht Cc: Tom Herbert Cc: Matt Mathis Cc: Yuchung Cheng Cc: Nandita Dukkipati Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 47b6c79e9b0..e20c17a7d34 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN Documentation/networking/tcp-thin.txt Default: 0 +tcp_limit_output_bytes - INTEGER + Controls TCP Small Queue limit per tcp socket. + TCP bulk sender tends to increase packets in flight until it + gets losses notifications. With SNDBUF autotuning, this can + result in a large amount of packets queued in qdisc/device + on the local machine, hurting latency of other flows, for + typical pfifo_fast qdiscs. + tcp_limit_output_bytes limits the number of bytes on qdisc + or device to reduce artificial RTT/cwnd and reduce bufferbloat. + Note: For GSO/TSO enabled flows, we try to have at least two + packets in flight. Reducing tcp_limit_output_bytes might also + reduce the size of individual GSO packet (64KB being the max) + Default: 131072 + UDP variables: udp_mem - vector of 3 INTEGERs: min, pressure, max -- cgit v1.2.3 From 282f23c6ee343126156dd41218b22ece96d747e3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 17 Jul 2012 10:13:05 +0200 Subject: tcp: implement RFC 5961 3.2 Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet Cc: Kiran Kumar Kella Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index e20c17a7d34..e1e021594cf 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -565,6 +565,11 @@ tcp_limit_output_bytes - INTEGER reduce the size of individual GSO packet (64KB being the max) Default: 131072 +tcp_challenge_ack_limit - INTEGER + Limits number of Challenge ACK sent per second, as recommended + in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) + Default: 100 + UDP variables: udp_mem - vector of 3 INTEGERs: min, pressure, max -- cgit v1.2.3 From 8427b2acfdd5e6c554fb7ad1fbccf53a24a08454 Mon Sep 17 00:00:00 2001 From: stephen hemminger Date: Thu, 19 Jul 2012 07:01:07 +0000 Subject: bridge: update documentation references Update the references to bridge utilities and web pages to current locations Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller --- Documentation/networking/bridge.txt | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt index a7ba5e4e2c9..a27cb6214ed 100644 --- a/Documentation/networking/bridge.txt +++ b/Documentation/networking/bridge.txt @@ -1,7 +1,14 @@ In order to use the Ethernet bridging functionality, you'll need the -userspace tools. These programs and documentation are available -at http://www.linuxfoundation.org/en/Net:Bridge. The download page is -http://prdownloads.sourceforge.net/bridge. +userspace tools. + +Documentation for Linux bridging is on: + http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge + +The bridge-utilities are maintained at: + git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git + +Additionally, the iproute2 utilities can be used to configure +bridge devices. If you still have questions, don't hesitate to post to the mailing list (more info https://lists.linux-foundation.org/mailman/listinfo/bridge). -- cgit v1.2.3 From cf60af03ca4e71134206809ea892e49b92a88896 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 19 Jul 2012 06:43:09 +0000 Subject: net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index e1e021594cf..03964e08818 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -468,6 +468,17 @@ tcp_syncookies - BOOLEAN SYN flood warnings in logs not being really flooded, your server is seriously misconfigured. +tcp_fastopen - INTEGER + Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data + in the opening SYN packet. To use this feature, the client application + must not use connect(). Instead, it should use sendmsg() or sendto() + with MSG_FASTOPEN flag which performs a TCP handshake automatically. + + The values (bitmap) are: + 1: Enables sending data in the opening SYN on the client + + Default: 0 + tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value -- cgit v1.2.3 From 67da22d23fa6f3324e03bcd0580b914b2e4afbf3 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 19 Jul 2012 06:43:11 +0000 Subject: net-tcp: Fast Open client - cookie-less mode In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 03964e08818..5f3ef7f7fce 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -476,6 +476,8 @@ tcp_fastopen - INTEGER The values (bitmap) are: 1: Enables sending data in the opening SYN on the client + 5: Enables sending data in the opening SYN on the client regardless + of cookie availability. Default: 0 -- cgit v1.2.3 From efaac3bf087b1a6cec28f2a041e01c874d65390c Mon Sep 17 00:00:00 2001 From: Leo Alterman Date: Fri, 20 Jul 2012 14:51:07 -0700 Subject: openvswitch: Fix typo in documentation. Signed-off-by: Leo Alterman Signed-off-by: Jesse Gross --- Documentation/networking/openvswitch.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt index b8a048b8df3..8fa2dd1e792 100644 --- a/Documentation/networking/openvswitch.txt +++ b/Documentation/networking/openvswitch.txt @@ -118,7 +118,7 @@ essentially like this, ignoring metadata: Naively, to add VLAN support, it makes sense to add a new "vlan" flow key attribute to contain the VLAN tag, then continue to decode the encapsulated headers beyond the VLAN tag using the existing field -definitions. With this change, an TCP packet in VLAN 10 would have a +definitions. With this change, a TCP packet in VLAN 10 would have a flow key much like this: eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) -- cgit v1.2.3 From 5aa93bcf66f4af094d6f11096e81d5501a0b4ba5 Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Sat, 21 Jul 2012 07:56:07 +0000 Subject: sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman CC: Vlad Yasevich CC: Sridhar Samudrala CC: "David S. Miller" CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5f3ef7f7fce..406a5226220 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1440,6 +1440,20 @@ path_max_retrans - INTEGER Default: 5 +pf_retrans - INTEGER + The number of retransmissions that will be attempted on a given path + before traffic is redirected to an alternate transport (should one + exist). Note this is distinct from path_max_retrans, as a path that + passes the pf_retrans threshold can still be used. Its only + deprioritized when a transmission path is selected by the stack. This + setting is primarily used to enable fast failover mechanisms without + having to reduce path_max_retrans to a very low value. See: + http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt + for details. Note also that a value of pf_retrans > path_max_retrans + disables this feature + + Default: 0 + rto_initial - INTEGER The initial round trip timeout value in milliseconds that will be used in calculating round trip times. This is the initial time interval -- cgit v1.2.3 From f8b72d36d2eb94824d8445efdd706bf037570f88 Mon Sep 17 00:00:00 2001 From: Rick Jones Date: Fri, 20 Jul 2012 10:51:37 +0000 Subject: net-next: minor cleanups for bonding documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The section titled "Configuring Bonding for Maximum Throughput" is actually section twelve not thirteen, and there are a couple of words spelled incorrectly. Signed-off-by: Rick Jones Reviewed-by: Nicolas de Pesloüan Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a33890..6b1c7110534 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1210,7 +1210,7 @@ options, you may wish to use the "max_bonds" module parameter, documented above. To create multiple bonding devices with differing options, it is -preferrable to use bonding parameters exported by sysfs, documented in the +preferable to use bonding parameters exported by sysfs, documented in the section below. For versions of bonding without sysfs support, the only means to @@ -1950,7 +1950,7 @@ access to fail over to. Additionally, the bonding load balance modes support link monitoring of their members, so if individual links fail, the load will be rebalanced across the remaining devices. - See Section 13, "Configuring Bonding for Maximum Throughput" + See Section 12, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. 11.2 High Availability in a Multiple Switch Topology @@ -2620,7 +2620,7 @@ be found at: https://lists.sourceforge.net/lists/listinfo/bonding-devel - Discussions regarding the developpement of the bonding driver take place + Discussions regarding the development of the bonding driver take place on the main Linux network mailing list, hosted at vger.kernel.org. The list address is: -- cgit v1.2.3 From 0c7462a2351b4cc502f326aad7fedd04909928be Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 30 Jul 2012 07:14:29 +0000 Subject: ipv4: remove rt_cache_rebuild_count After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 6 ------ 1 file changed, 6 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 406a5226220..ca447b35b83 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -48,12 +48,6 @@ min_adv_mss - INTEGER The advertised MSS depends on the first hop route MTU, but will never be lower than this setting. -rt_cache_rebuild_count - INTEGER - The per net-namespace route cache emergency rebuild threshold. - Any net-namespace having its route cache rebuilt due to - a hash bucket chain being too long more than this many times - will have its route caching disabled - IP Fragmentation: ipfrag_high_thresh - INTEGER -- cgit v1.2.3 From 6556bfde65b1d4bea29eb2e1566398676792eaaa Mon Sep 17 00:00:00 2001 From: Dirk Gouders Date: Fri, 10 Aug 2012 01:24:51 +0000 Subject: netconsole.txt: revision of examples for the receiver of kernel messages There are at least 4 implementations of netcat with the BSD-based being the only one that has to be used without the -p switch to specify the listening port. Jan Engelhardt suggested to add an example for socat(1). Signed-off-by: Dirk Gouders Signed-off-by: Cong Wang Signed-off-by: David S. Miller --- Documentation/networking/netconsole.txt | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt index 8d022073e3e..2e9e0ae2cd4 100644 --- a/Documentation/networking/netconsole.txt +++ b/Documentation/networking/netconsole.txt @@ -51,8 +51,23 @@ Built-in netconsole starts immediately after the TCP stack is initialized and attempts to bring up the supplied dev at the supplied address. -The remote host can run either 'netcat -u -l -p ', -'nc -l -u ' or syslogd. +The remote host has several options to receive the kernel messages, +for example: + +1) syslogd + +2) netcat + + On distributions using a BSD-based netcat version (e.g. Fedora, + openSUSE and Ubuntu) the listening port must be specified without + the -p switch: + + 'nc -u -l -p ' / 'nc -u -l ' or + 'netcat -u -l -p ' / 'netcat -u -l ' + +3) socat + + 'socat udp-recv: -' Dynamic reconfiguration: ======================== -- cgit v1.2.3 From 6b923cb7188d46905f43fa84210c4c3e5f9cd8fb Mon Sep 17 00:00:00 2001 From: John Eaglesham Date: Tue, 21 Aug 2012 20:43:35 +0000 Subject: bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham Signed-off-by: John Eaglesham Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6b1c7110534..10a015c384b 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,22 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 formula is + + hash = (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + XOR (source MAC XOR destination MAC)) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +788,29 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + hash = (source port XOR dest port) XOR + ((source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4)) + + ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. -- cgit v1.2.3 From 536a23f119e35e58c762a219bafd398ba2ed7980 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Mon, 18 Jun 2012 18:39:26 +0200 Subject: batman-adv: Add the backbone gateway list to debugfs This is especially useful if there are no claims yet, but we still want to know which gateways are using bridge loop avoidance in the network. Signed-off-by: Simon Wunderlich Signed-off-by: Antonio Quartulli --- Documentation/networking/batman-adv.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index 8f3ae4a6147..a173d2a879f 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -75,9 +75,10 @@ folder: There is a special folder for debugging information: -# ls /sys/kernel/debug/batman_adv/bat0/ -# bla_claim_table log socket transtable_local -# gateways originators transtable_global vis_data +# ls /sys/kernel/debug/batman_adv/bat0/ +# bla_backbone_table log transtable_global +# bla_claim_table originators transtable_local +# gateways socket vis_data Some of the files contain all sort of status information regard- ing the mesh network. For example, you can view the table of -- cgit v1.2.3 From 6c9ff979d1921e9fd05d89e1383121c2503759b9 Mon Sep 17 00:00:00 2001 From: Alex Bergmann Date: Fri, 31 Aug 2012 02:48:31 +0000 Subject: tcp: Increase timeout for SYN segments Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") changed the initRTO from 3secs to 1sec in accordance to RFC6298 (former RFC2988bis). This reduced the time till the last SYN retransmission packet gets sent from 93secs to 31secs. RFC1122 is stating that the retransmission should be done for at least 3 minutes, but this seems to be quite high. "However, the values of R1 and R2 may be different for SYN and data segments. In particular, R2 for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course." This patch increases the value of TCP_SYN_RETRIES to the value of 6, providing a retransmission window of 63secs. The comments for SYN and SYNACK retries have also been updated to describe the current settings. The same goes for the documentation file "Documentation/networking/ip-sysctl.txt". Signed-off-by: Alexander Bergmann Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ca447b35b83..d64e53124b8 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -439,7 +439,9 @@ tcp_stdurg - BOOLEAN tcp_synack_retries - INTEGER Number of times SYNACKs for a passive TCP connection attempt will be retransmitted. Should not be higher than 255. Default value - is 5, which corresponds to ~180seconds. + is 5, which corresponds to 31seconds till the last retransmission + with the current initial RTO of 1second. With this the final timeout + for a passive TCP connection will happen after 63seconds. tcp_syncookies - BOOLEAN Only valid when the kernel was compiled with CONFIG_SYNCOOKIES @@ -478,7 +480,9 @@ tcp_fastopen - INTEGER tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value - is 5, which corresponds to ~180seconds. + is 6, which corresponds to 63seconds till the last restransmission + with the current initial RTO of 1second. With this the final timeout + for an active TCP connection attempt will happen after 127seconds. tcp_timestamps - BOOLEAN Enable timestamps as defined in RFC1323. -- cgit v1.2.3 From d56631a66c0d0c9d662abfb38cd1f6326eeebd7c Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Thu, 30 Aug 2012 05:50:43 +0000 Subject: net:stmmac: Remove bus_id from mdio platform data. This patch removes bus_id from mdio platform data, The reason to remove bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there is no point in passing this in different variable. Also stmmac ethernet driver connects to phy with bus_id passed its platform data. So, having single bus-id is much simpler. Signed-off-by: Srinivas Kandagatla Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 5 ----- 1 file changed, 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index c676b9cedbd..ef9ee71b4d7 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -173,7 +173,6 @@ Where: For MDIO bus The we have: struct stmmac_mdio_bus_data { - int bus_id; int (*phy_reset)(void *priv); unsigned int phy_mask; int *irqs; @@ -181,7 +180,6 @@ For MDIO bus The we have: }; Where: - o bus_id: bus identifier; o phy_reset: hook to reset the phy device attached to the bus. o phy_mask: phy mask passed when register the MDIO bus within the driver. o irqs: list of IRQs, one per PHY. @@ -230,9 +228,6 @@ there are two MAC cores: one MAC is for MDIO Bus/PHY emulation with fixed_link support. static struct stmmac_mdio_bus_data stmmac1_mdio_bus = { - .bus_id = 1, - | - |-> phy device on the bus_id 1 .phy_reset = phy_reset; | |-> function to provide the phy_reset on this board -- cgit v1.2.3 From 1046716368979dee857a2b8a91c4a8833f21b9cb Mon Sep 17 00:00:00 2001 From: Jerry Chu Date: Fri, 31 Aug 2012 12:29:11 +0000 Subject: tcp: TCP Fast Open Server - header & support functions This patch adds all the necessary data structure and support functions to implement TFO server side. It also documents a number of flags for the sysctl_tcp_fastopen knob, and adds a few Linux extension MIBs. In addition, it includes the following: 1. a new TCP_FASTOPEN socket option an application must call to supply a max backlog allowed in order to enable TFO on its listener. 2. A number of key data structures: "fastopen_rsk" in tcp_sock - for a big socket to access its request_sock for retransmission and ack processing purpose. It is non-NULL iff 3WHS not completed. "fastopenq" in request_sock_queue - points to a per Fast Open listener data structure "fastopen_queue" to keep track of qlen (# of outstanding Fast Open requests) and max_qlen, among other things. "listener" in tcp_request_sock - to point to the original listener for book-keeping purpose, i.e., to maintain qlen against max_qlen as part of defense against IP spoofing attack. 3. various data structure and functions, many in tcp_fastopen.c, to support server side Fast Open cookie operations, including /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying. Signed-off-by: H.K. Jerry Chu Cc: Yuchung Cheng Cc: Neal Cardwell Cc: Eric Dumazet Cc: Tom Herbert Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d64e53124b8..c7fc1072494 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -467,16 +467,31 @@ tcp_syncookies - BOOLEAN tcp_fastopen - INTEGER Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data in the opening SYN packet. To use this feature, the client application - must not use connect(). Instead, it should use sendmsg() or sendto() - with MSG_FASTOPEN flag which performs a TCP handshake automatically. - - The values (bitmap) are: - 1: Enables sending data in the opening SYN on the client - 5: Enables sending data in the opening SYN on the client regardless - of cookie availability. + must use sendmsg() or sendto() with MSG_FASTOPEN flag rather than + connect() to perform a TCP handshake automatically. + + The values (bitmap) are + 1: Enables sending data in the opening SYN on the client. + 2: Enables TCP Fast Open on the server side, i.e., allowing data in + a SYN packet to be accepted and passed to the application before + 3-way hand shake finishes. + 4: Send data in the opening SYN regardless of cookie availability and + without a cookie option. + 0x100: Accept SYN data w/o validating the cookie. + 0x200: Accept data-in-SYN w/o any cookie option present. + 0x400/0x800: Enable Fast Open on all listeners regardless of the + TCP_FASTOPEN socket option. The two different flags designate two + different ways of setting max_qlen without the TCP_FASTOPEN socket + option. Default: 0 + Note that the client & server side Fast Open flags (1 and 2 + respectively) must be also enabled before the rest of flags can take + effect. + + See include/net/tcp.h and the code for more details. + tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value -- cgit v1.2.3 From d342894c5d2f8c7df194c793ec4059656e09ca31 Mon Sep 17 00:00:00 2001 From: stephen hemminger Date: Mon, 1 Oct 2012 12:32:35 +0000 Subject: vxlan: virtual extensible lan This is an implementation of Virtual eXtensible Local Area Network as described in draft RFC: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 The driver integrates a Virtual Tunnel Endpoint (VTEP) functionality that learns MAC to IP address mapping. This implementation has not been tested only against the Linux userspace implementation using TAP, not against other vendor's equipment. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller --- Documentation/networking/vxlan.txt | 47 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 Documentation/networking/vxlan.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/vxlan.txt b/Documentation/networking/vxlan.txt new file mode 100644 index 00000000000..5b34b762d7d --- /dev/null +++ b/Documentation/networking/vxlan.txt @@ -0,0 +1,47 @@ +Virtual eXtensible Local Area Networking documentation +====================================================== + +The VXLAN protocol is a tunnelling protocol that is designed to +solve the problem of limited number of available VLAN's (4096). +With VXLAN identifier is expanded to 24 bits. + +It is a draft RFC standard, that is implemented by Cisco Nexus, +Vmware and Brocade. The protocol runs over UDP using a single +destination port (still not standardized by IANA). +This document describes the Linux kernel tunnel device, +there is also an implantation of VXLAN for Openvswitch. + +Unlike most tunnels, a VXLAN is a 1 to N network, not just point +to point. A VXLAN device can either dynamically learn the IP address +of the other end, in a manner similar to a learning bridge, or the +forwarding entries can be configured statically. + +The management of vxlan is done in a similar fashion to it's +too closest neighbors GRE and VLAN. Configuring VXLAN requires +the version of iproute2 that matches the kernel release +where VXLAN was first merged upstream. + +1. Create vxlan device + # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1 + +This creates a new device (vxlan0). The device uses the +the multicast group 239.1.1.1 over eth1 to handle packets where +no entry is in the forwarding table. + +2. Delete vxlan device + # ip link delete vxlan0 + +3. Show vxlan info + # ip -d show vxlan0 + +It is possible to create, destroy and display the vxlan +forwarding table using the new bridge command. + +1. Create forwarding table entry + # bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0 + +2. Delete forwarding table entry + # bridge fdb delete 00:17:42:8a:b4:05 + +3. Show forwarding table + # bridge fdb show dev vxlan0 -- cgit v1.2.3 From 3c68198e75111a905ac2412be12bf7b29099729b Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Wed, 24 Oct 2012 09:20:03 +0000 Subject: sctp: Make hmac algorithm selection for cookie generation dynamic Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to generate cookie values when establishing new connections via two build time config options. Theres no real reason to make this a static selection. We can add a sysctl that allows for the dynamic selection of these algorithms at run time, with the default value determined by the corresponding crypto library availability. This comes in handy when, for example running a system in FIPS mode, where use of md5 is disallowed, but SHA1 is permitted. Note: This new sysctl has no corresponding socket option to select the cookie hmac algorithm. I chose not to implement that intentionally, as RFC 6458 contains no option for this value, and I opted not to pollute the socket option namespace. Change notes: v2) * Updated subject to have the proper sctp prefix as per Dave M. * Replaced deafult selection options with new options that allow developers to explicitly select available hmac algs at build time as per suggestion by Vlad Y. Signed-off-by: Neil Horman CC: Vlad Yasevich CC: "David S. Miller" CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index c7fc1072494..98ac0d7552a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1514,6 +1514,20 @@ cookie_preserve_enable - BOOLEAN Default: 1 +cookie_hmac_alg - STRING + Select the hmac algorithm used when generating the cookie value sent by + a listening sctp socket to a connecting client in the INIT-ACK chunk. + Valid values are: + * md5 + * sha1 + * none + Ability to assign md5 or sha1 as the selected alg is predicated on the + configuarion of those algorithms at build time (CONFIG_CRYPTO_MD5 and + CONFIG_CRYPTO_SHA1). + + Default: Dependent on configuration. MD5 if available, else SHA1 if + available, else none. + rcvbuf_policy - INTEGER Determines if the receive buffer is attributed to the socket or to association. SCTP supports the capability to create multiple -- cgit v1.2.3 From 0e861a3c4ffef56822e1d51c355e5020deaeaf5a Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Mon, 1 Oct 2012 09:57:36 +0200 Subject: batman-adv: Distributed ARP Table - add a new debug log level A new log level has been added to concentrate messages regarding DAT: ARP snooping, requests, response and DHT related messages. The new log level is named BATADV_DBG_DAT Signed-off-by: Antonio Quartulli --- Documentation/networking/batman-adv.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index a173d2a879f..c1d82047a4b 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -203,7 +203,8 @@ abled during run time. Following log_levels are defined: 2 - Enable messages related to route added / changed / deleted 4 - Enable messages related to translation table operations 8 - Enable messages related to bridge loop avoidance -15 - enable all messages +16 - Enable messaged related to DAT, ARP snooping and parsing +31 - Enable all messages The debug output can be changed at runtime using the file /sys/class/net/bat0/mesh/log_level. e.g. -- cgit v1.2.3 From 5920cd3a41f1aefc30e9ce86384fc2fe9f5fe0c0 Mon Sep 17 00:00:00 2001 From: Paul Chavent Date: Tue, 6 Nov 2012 23:10:47 +0000 Subject: packet: tx_ring: allow the user to choose tx data offset The tx data offset of packet mmap tx ring used to be : (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)) The problem is that, with SOCK_RAW socket, the payload (14 bytes after the beginning of the user data) is misaligned. This patch allows to let the user gives an offset for it's tx data if he desires. Set sock option PACKET_TX_HAS_OFF to 1, then specify in each frame of your tx ring tp_net for SOCK_DGRAM, or tp_mac for SOCK_RAW. Signed-off-by: Paul Chavent Signed-off-by: David S. Miller --- Documentation/networking/packet_mmap.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 1c08a4b0981..7cd879eba5d 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -163,6 +163,19 @@ As capture, each frame contains two parts: A complete tutorial is available at: http://wiki.gnu-log.net/ +By default, the user should put data at : + frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll) + +So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW), +the beginning of the user data will be at : + frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + +If you wish to put user data at a custom offset from the beginning of +the frame (for payload alignment with SOCK_RAW mode for instance) you +can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order +to make this work it must be enabled previously with setsockopt() +and the PACKET_TX_HAS_OFF option. + -------------------------------------------------------------------------------- + PACKET_MMAP settings -------------------------------------------------------------------------------- -- cgit v1.2.3 From d1ee40f96036e838f0849dd31c16e548a904176c Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Thu, 8 Nov 2012 02:37:01 +0000 Subject: doc: packet_mmap: update doc to implementation status MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This improves the packet_mmap.txt document in the following ways: * Add initial information about different TPACKET versions * Add initial information about packet fanout * Add pointer to BPF document (since this also could be of interest) * 'Fix' minor, rather cosmetic things Information partially taken from related commit messages. Reported-by: Ronny Meeus Signed-off-by: Daniel Borkmann Cc: Ulisses Alonso Camaró Cc: Johann Baudy Signed-off-by: David S. Miller --- Documentation/networking/packet_mmap.txt | 233 +++++++++++++++++++++++++++---- 1 file changed, 209 insertions(+), 24 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 7cd879eba5d..94444b152fb 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -3,9 +3,9 @@ -------------------------------------------------------------------------------- This file documents the mmap() facility available with the PACKET -socket interface on 2.4 and 2.6 kernels. This type of sockets is used for -capture network traffic with utilities like tcpdump or any other that needs -raw access to network interface. +socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for +i) capture network traffic with utilities like tcpdump, ii) transmit network +traffic, or any other that needs raw access to network interface. You can find the latest version of this document at: http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap @@ -21,19 +21,18 @@ Please send your comments to + Why use PACKET_MMAP -------------------------------------------------------------------------------- -In Linux 2.4/2.6 if PACKET_MMAP is not enabled, the capture process is very -inefficient. It uses very limited buffers and requires one system call -to capture each packet, it requires two if you want to get packet's -timestamp (like libpcap always does). +In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very +inefficient. It uses very limited buffers and requires one system call to +capture each packet, it requires two if you want to get packet's timestamp +(like libpcap always does). In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size configurable circular buffer mapped in user space that can be used to either send or receive packets. This way reading packets just needs to wait for them, most of the time there is no need to issue a single system call. Concerning transmission, multiple packets can be sent through one system call to get the -highest bandwidth. -By using a shared buffer between the kernel and the user also has the benefit -of minimizing packet copies. +highest bandwidth. By using a shared buffer between the kernel and the user +also has the benefit of minimizing packet copies. It's fine to use PACKET_MMAP to improve the performance of the capture and transmission process, but it isn't everything. At least, if you are capturing @@ -41,7 +40,8 @@ at high speeds (this is relative to the cpu speed), you should check if the device driver of your network interface card supports some sort of interrupt load mitigation or (even better) if it supports NAPI, also make sure it is enabled. For transmission, check the MTU (Maximum Transmission Unit) used and -supported by devices of your network. +supported by devices of your network. CPU IRQ pinning of your network interface +card can also be an advantage. -------------------------------------------------------------------------------- + How to use mmap() to improve capture process @@ -87,9 +87,7 @@ the following process: socket creation and destruction is straight forward, and is done the same way with or without PACKET_MMAP: -int fd; - -fd= socket(PF_PACKET, mode, htons(ETH_P_ALL)) + int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL)); where mode is SOCK_RAW for the raw interface were link level information can be captured or SOCK_DGRAM for the cooked @@ -180,7 +178,6 @@ and the PACKET_TX_HAS_OFF option. + PACKET_MMAP settings -------------------------------------------------------------------------------- - To setup PACKET_MMAP from user level code is done with a call like - Capture process @@ -214,7 +211,6 @@ indeed, packet_set_ring checks that the following condition is true frames_per_block * tp_block_nr == tp_frame_nr - Lets see an example, with the following values: tp_block_size= 4096 @@ -240,7 +236,6 @@ be spawned across two blocks, so there are some details you have to take into account when choosing the frame_size. See "Mapping and use of the circular buffer (ring)". - -------------------------------------------------------------------------------- + PACKET_MMAP setting constraints -------------------------------------------------------------------------------- @@ -277,7 +272,6 @@ User space programs can include /usr/include/sys/user.h and The pagesize can also be determined dynamically with the getpagesize (2) system call. - Block number limit -------------------- @@ -297,7 +291,6 @@ called pg_vec, its size limits the number of blocks that can be allocated. v block #2 block #1 - kmalloc allocates any number of bytes of physically contiguous memory from a pool of pre-determined sizes. This pool of memory is maintained by the slab allocator which is at the end the responsible for doing the allocation and @@ -312,7 +305,6 @@ pointers to blocks is 131072/4 = 32768 blocks - PACKET_MMAP buffer size calculator ------------------------------------ @@ -353,7 +345,6 @@ and a value for of 2048 bytes. These parameters will yield and hence the buffer will have a 262144 MiB size. So it can hold 262144 MiB / 2048 bytes = 134217728 frames - Actually, this buffer size is not possible with an i386 architecture. Remember that the memory is allocated in kernel space, in the case of an i386 kernel's memory size is limited to 1GiB. @@ -385,7 +376,6 @@ the following (from include/linux/if_packet.h): - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16. - Pad to align to TPACKET_ALIGNMENT=16 */ - The following are conditions that are checked in packet_set_ring @@ -426,7 +416,6 @@ and the following flags apply: #define TP_STATUS_LOSING 4 #define TP_STATUS_CSUMNOTREADY 8 - TP_STATUS_COPY : This flag indicates that the frame (and associated meta information) has been truncated because it's larger than tp_frame_size. This packet can be @@ -475,7 +464,6 @@ packets are in the ring: It doesn't incur in a race condition to first check the status value and then poll for frames. - ++ Transmission process Those defines are also used for transmission: @@ -506,6 +494,196 @@ The user can also use poll() to check if a buffer is available: pfd.events = POLLOUT; retval = poll(&pfd, 1, timeout); +------------------------------------------------------------------------------- ++ What TPACKET versions are available and when to use them? +------------------------------------------------------------------------------- + + int val = tpacket_version; + setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val)); + getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val)); + +where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3. + +TPACKET_V1: + - Default if not otherwise specified by setsockopt(2) + - RX_RING, TX_RING available + - VLAN metadata information available for packets + (TP_STATUS_VLAN_VALID) + +TPACKET_V1 --> TPACKET_V2: + - Made 64 bit clean due to unsigned long usage in TPACKET_V1 + structures, thus this also works on 64 bit kernel with 32 bit + userspace and the like + - Timestamp resolution in nanoseconds instead of microseconds + - RX_RING, TX_RING available + - How to switch to TPACKET_V2: + 1. Replace struct tpacket_hdr by struct tpacket2_hdr + 2. Query header len and save + 3. Set protocol version to 2, set up ring as usual + 4. For getting the sockaddr_ll, + use (void *)hdr + TPACKET_ALIGN(hdrlen) instead of + (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr)) + +TPACKET_V2 --> TPACKET_V3: + - Flexible buffer implementation: + 1. Blocks can be configured with non-static frame-size + 2. Read/poll is at a block-level (as opposed to packet-level) + 3. Added poll timeout to avoid indefinite user-space wait + on idle links + 4. Added user-configurable knobs: + 4.1 block::timeout + 4.2 tpkt_hdr::sk_rxhash + - RX Hash data available in user space + - Currently only RX_RING available + +------------------------------------------------------------------------------- ++ AF_PACKET fanout mode +------------------------------------------------------------------------------- + +In the AF_PACKET fanout mode, packet reception can be load balanced among +processes. This also works in combination with mmap(2) on packet sockets. + +Minimal example code by David S. Miller (try things like "./test eth0 hash", +"./test eth0 lb", etc.): + +#include +#include +#include +#include + +#include +#include +#include +#include + +#include + +#include +#include + +#include + +static const char *device_name; +static int fanout_type; +static int fanout_id; + +#ifndef PACKET_FANOUT +# define PACKET_FANOUT 18 +# define PACKET_FANOUT_HASH 0 +# define PACKET_FANOUT_LB 1 +#endif + +static int setup_socket(void) +{ + int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP)); + struct sockaddr_ll ll; + struct ifreq ifr; + int fanout_arg; + + if (fd < 0) { + perror("socket"); + return EXIT_FAILURE; + } + + memset(&ifr, 0, sizeof(ifr)); + strcpy(ifr.ifr_name, device_name); + err = ioctl(fd, SIOCGIFINDEX, &ifr); + if (err < 0) { + perror("SIOCGIFINDEX"); + return EXIT_FAILURE; + } + + memset(&ll, 0, sizeof(ll)); + ll.sll_family = AF_PACKET; + ll.sll_ifindex = ifr.ifr_ifindex; + err = bind(fd, (struct sockaddr *) &ll, sizeof(ll)); + if (err < 0) { + perror("bind"); + return EXIT_FAILURE; + } + + fanout_arg = (fanout_id | (fanout_type << 16)); + err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT, + &fanout_arg, sizeof(fanout_arg)); + if (err) { + perror("setsockopt"); + return EXIT_FAILURE; + } + + return fd; +} + +static void fanout_thread(void) +{ + int fd = setup_socket(); + int limit = 10000; + + if (fd < 0) + exit(fd); + + while (limit-- > 0) { + char buf[1600]; + int err; + + err = read(fd, buf, sizeof(buf)); + if (err < 0) { + perror("read"); + exit(EXIT_FAILURE); + } + if ((limit % 10) == 0) + fprintf(stdout, "(%d) \n", getpid()); + } + + fprintf(stdout, "%d: Received 10000 packets\n", getpid()); + + close(fd); + exit(0); +} + +int main(int argc, char **argp) +{ + int fd, err; + int i; + + if (argc != 3) { + fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]); + return EXIT_FAILURE; + } + + if (!strcmp(argp[2], "hash")) + fanout_type = PACKET_FANOUT_HASH; + else if (!strcmp(argp[2], "lb")) + fanout_type = PACKET_FANOUT_LB; + else { + fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]); + exit(EXIT_FAILURE); + } + + device_name = argp[1]; + fanout_id = getpid() & 0xffff; + + for (i = 0; i < 4; i++) { + pid_t pid = fork(); + + switch (pid) { + case 0: + fanout_thread(); + + case -1: + perror("fork"); + exit(EXIT_FAILURE); + } + } + + for (i = 0; i < 4; i++) { + int status; + + wait(&status); + } + + return 0; +} + ------------------------------------------------------------------------------- + PACKET_TIMESTAMP ------------------------------------------------------------------------------- @@ -532,6 +710,13 @@ the networking stack is used (the behavior before this setting was added). See include/linux/net_tstamp.h and Documentation/networking/timestamping for more information on hardware timestamps. +------------------------------------------------------------------------------- ++ Miscellaneous bits +------------------------------------------------------------------------------- + +- Packet sockets work well together with Linux socket filters, thus you also + might want to have a look at Documentation/networking/filter.txt + -------------------------------------------------------------------------------- + THANKS -------------------------------------------------------------------------------- -- cgit v1.2.3 From 73e212fc48890b552e4ae65b65c0e709f478879b Mon Sep 17 00:00:00 2001 From: Kirill Smelkov Date: Sat, 10 Nov 2012 07:12:36 +0000 Subject: doc/net: Fix typo in netdev-features.txt Signed-off-by: Kirill Smelkov Signed-off-by: David S. Miller --- Documentation/networking/netdev-features.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt index 4164f5c02e4..f310edec8a7 100644 --- a/Documentation/networking/netdev-features.txt +++ b/Documentation/networking/netdev-features.txt @@ -164,4 +164,4 @@ read the CRC recorded by the NIC on receipt of the packet. This requests that the NIC receive all possible frames, including errored frames (such as bad FCS, etc). This can be helpful when sniffing a link with bad packets on it. Some NICs may receive more packets if also put into normal -PROMISC mdoe. +PROMISC mode. -- cgit v1.2.3 From cc9b310165e7ea2f3dc90e1eea6ce57c9b7981d1 Mon Sep 17 00:00:00 2001 From: Zhi Yong Wu Date: Thu, 22 Nov 2012 00:10:01 +0000 Subject: vxlan: fix command usage in its doc Some commands don't work in its example doc. The patch will fix it. Signed-off-by: Zhi Yong Wu Signed-off-by: David S. Miller --- Documentation/networking/vxlan.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/vxlan.txt b/Documentation/networking/vxlan.txt index 5b34b762d7d..6d993510f09 100644 --- a/Documentation/networking/vxlan.txt +++ b/Documentation/networking/vxlan.txt @@ -32,7 +32,7 @@ no entry is in the forwarding table. # ip link delete vxlan0 3. Show vxlan info - # ip -d show vxlan0 + # ip -d link show vxlan0 It is possible to create, destroy and display the vxlan forwarding table using the new bridge command. @@ -41,7 +41,7 @@ forwarding table using the new bridge command. # bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0 2. Delete forwarding table entry - # bridge fdb delete 00:17:42:8a:b4:05 + # bridge fdb delete 00:17:42:8a:b4:05 dev vxlan0 3. Show forwarding table # bridge fdb show dev vxlan0 -- cgit v1.2.3 From f9e01b5565398e549a5d391ea2e62f7b6e806e3f Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Sun, 25 Nov 2012 23:10:45 +0000 Subject: stmmac: update the doc with new IRQ mitigation This patch updates the stmmac.txt adding some information about the new rx/tx mitigation schema adopted in the driver. Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index ef9ee71b4d7..f9fa6db40a5 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -29,11 +29,9 @@ The kernel configuration option is STMMAC_ETH: dma_txsize: DMA tx ring size; buf_sz: DMA buffer size; tc: control the HW FIFO threshold; - tx_coe: Enable/Disable Tx Checksum Offload engine; watchdog: transmit timeout (in milliseconds); flow_ctrl: Flow control ability [on/off]; pause: Flow Control Pause Time; - tmrate: timer period (only if timer optimisation is configured). 3) Command line options Driver parameters can be also passed in command line by using: @@ -60,17 +58,19 @@ Then the poll method will be scheduled at some future point. The incoming packets are stored, by the DMA, in a list of pre-allocated socket buffers in order to avoid the memcpy (Zero-copy). -4.3) Timer-Driver Interrupt -Instead of having the device that asynchronously notifies the frame receptions, -the driver configures a timer to generate an interrupt at regular intervals. -Based on the granularity of the timer, the frames that are received by the -device will experience different levels of latency. Some NICs have dedicated -timer device to perform this task. STMMAC can use either the RTC device or the -TMU channel 2 on STLinux platforms. -The timers frequency can be passed to the driver as parameter; when change it, -take care of both hardware capability and network stability/performance impact. -Several performance tests on STM platforms showed this optimisation allows to -spare the CPU while having the maximum throughput. +4.3) Interrupt Mitigation +The driver is able to mitigate the number of its DMA interrupts +using NAPI for the reception on chips older than the 3.50. +New chips have an HW RX-Watchdog used for this mitigation. + +On Tx-side, the mitigation schema is based on a SW timer that calls the +tx function (stmmac_tx) to reclaim the resource after transmitting the +frames. +Also there is another parameter (like a threshold) used to program +the descriptors avoiding to set the interrupt on completion bit in +when the frame is sent (xmit). + +Mitigation parameters can be tuned by ethtool. 4.4) WOL Wake up on Lan feature through Magic and Unicast frames are supported for the @@ -121,6 +121,7 @@ struct plat_stmmacenet_data { int bugged_jumbo; int pmt; int force_sf_dma_mode; + int riwt_off; void (*fix_mac_speed)(void *priv, unsigned int speed); void (*bus_setup)(void __iomem *ioaddr); int (*init)(struct platform_device *pdev); @@ -156,6 +157,7 @@ Where: o pmt: core has the embedded power module (optional). o force_sf_dma_mode: force DMA to use the Store and Forward mode instead of the Threshold. + o riwt_off: force to disable the RX watchdog feature and switch to NAPI mode. o fix_mac_speed: this callback is used for modifying some syscfg registers (on ST SoCs) according to the link speed negotiated by the physical layer . -- cgit v1.2.3 From 7e3a2dc52953f126103a36b33db1f57463fbbb8f Mon Sep 17 00:00:00 2001 From: Rick Jones Date: Wed, 28 Nov 2012 09:53:10 +0000 Subject: doc: make the description of how tcp_ecn works more explicit and clear Make the description of how tcp_ecn works a bit more explicit and clear. Signed-off-by: Rick Jones Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 98ac0d7552a..c6d5fee888c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -199,15 +199,16 @@ tcp_early_retrans - INTEGER Default: 2 tcp_ecn - INTEGER - Enable Explicit Congestion Notification (ECN) in TCP. ECN is only - used when both ends of the TCP flow support it. It is useful to - avoid losses due to congestion (when the bottleneck router supports - ECN). + Control use of Explicit Congestion Notification (ECN) by TCP. + ECN is used only when both ends of the TCP connection indicate + support for it. This feature is useful in avoiding losses due + to congestion by allowing supporting routers to signal + congestion before having to drop packets. Possible values are: - 0 disable ECN - 1 ECN enabled - 2 Only server-side ECN enabled. If the other end does - not support ECN, behavior is like with ECN disabled. + 0 Disable ECN. Neither initiate nor accept ECN. + 1 Always request ECN on outgoing connection attempts. + 2 Enable ECN when requested by incomming connections + but do not request ECN on outgoing connections. Default: 2 tcp_fack - BOOLEAN -- cgit v1.2.3 From cc86802805b5d714a5dc80fe4edecaf1368b09ed Mon Sep 17 00:00:00 2001 From: Shan Wei Date: Tue, 4 Dec 2012 18:50:35 +0000 Subject: net: doc: add default value for neighbour parameters Signed-off-by: Shan Wei Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index c6d5fee888c..0462a710530 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -30,16 +30,24 @@ neigh/default/gc_thresh3 - INTEGER Maximum number of neighbor entries allowed. Increase this when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. + Default: 1024 neigh/default/unres_qlen_bytes - INTEGER The maximum number of bytes which may be used by packets queued for each unresolved address by other network layers. (added in linux 3.3) + Seting negative value is meaningless and will retrun error. + Default: 65536 Bytes(64KB) neigh/default/unres_qlen - INTEGER The maximum number of packets which may be queued for each unresolved address by other network layers. (deprecated in linux 3.3) : use unres_qlen_bytes instead. + Prior to linux 3.3, the default value is 3 which may cause + secluded packet loss. The current default value is calculated + according to default value of unres_qlen_bytes and true size of + packet. + Default: 31 mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. -- cgit v1.2.3 From 5d248c491b38d4f1b2a0bd7721241d68cd0b3067 Mon Sep 17 00:00:00 2001 From: Shan Wei Date: Thu, 6 Dec 2012 16:27:51 +0000 Subject: net: doc : use more suitable word 'unexpected' to replace 'secluded' 'secluded' is used to describe places, not suitable here. Suggested-by: Ben Hutchings Signed-off-by: Shan Wei Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 0462a710530..1b830cac461 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -44,7 +44,7 @@ neigh/default/unres_qlen - INTEGER unresolved address by other network layers. (deprecated in linux 3.3) : use unres_qlen_bytes instead. Prior to linux 3.3, the default value is 3 which may cause - secluded packet loss. The current default value is calculated + unexpected packet loss. The current default value is calculated according to default value of unres_qlen_bytes and true size of packet. Default: 31 -- cgit v1.2.3 From d825da2ede50160e567e666ff43c89a403bf0193 Mon Sep 17 00:00:00 2001 From: Rick Jones Date: Mon, 10 Dec 2012 11:33:00 +0000 Subject: doc: Tighten-up and clarify description of tcp_fin_timeout The description for tcp_fin_timeout should be tigher and more clear. In addition to being tighter, we should make the spelling of the state name consistent with what utilities report, remove the now dated reference to 2.2 and put the default in the consistent place. Signed-off-by: Rick Jones Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 1b830cac461..dd52d516cb8 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -224,15 +224,14 @@ tcp_fack - BOOLEAN The value is not used, if tcp_sack is not enabled. tcp_fin_timeout - INTEGER - Time to hold socket in state FIN-WAIT-2, if it was closed - by our side. Peer can be broken and never close its side, - or even died unexpectedly. Default value is 60sec. - Usual value used in 2.2 was 180 seconds, you may restore - it, but remember that if your machine is even underloaded WEB server, - you risk to overflow memory with kilotons of dead sockets, - FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1, - because they eat maximum 1.5K of memory, but they tend - to live longer. Cf. tcp_max_orphans. + The length of time an orphaned (no longer referenced by any + application) connection will remain in the FIN_WAIT_2 state + before it is aborted at the local end. While a perfectly + valid "receive only" state for an un-orphaned connection, an + orphaned connection in FIN_WAIT_2 state could otherwise wait + forever for the remote to close its end of the connection. + Cf. tcp_max_orphans + Default: 60 seconds tcp_frto - INTEGER Enables Forward RTO-Recovery (F-RTO) defined in RFC4138. -- cgit v1.2.3 From db2b620aa03d1301398dcba8b1097686bd82e65b Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Tue, 1 Jan 2013 00:35:31 +0000 Subject: ipv6: document ndisc_notify in networking/ip-sysctl.txt I slipped in a new sysctl without proper documentation. I would like to make up for this now. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index dd52d516cb8..ac1710ef21a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1331,6 +1331,12 @@ force_tllao - BOOLEAN race condition where the sender deletes the cached link-layer address prior to receiving a response to a previous solicitation." +ndisc_notify - BOOLEAN + Define mode for notification of address and device changes. + 0 - (default): do nothing + 1 - Generate unsolicited neighbour advertisements when device is brought + up or hardware address changes. + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. -- cgit v1.2.3 From 3b09adcb20c1e393a8721b1805f49dd8c1657563 Mon Sep 17 00:00:00 2001 From: stephen hemminger Date: Thu, 3 Jan 2013 07:50:29 +0000 Subject: ip-sysctl: fix spelling errors Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ac1710ef21a..dbca6618208 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -36,7 +36,7 @@ neigh/default/unres_qlen_bytes - INTEGER The maximum number of bytes which may be used by packets queued for each unresolved address by other network layers. (added in linux 3.3) - Seting negative value is meaningless and will retrun error. + Setting negative value is meaningless and will return error. Default: 65536 Bytes(64KB) neigh/default/unres_qlen - INTEGER @@ -215,7 +215,7 @@ tcp_ecn - INTEGER Possible values are: 0 Disable ECN. Neither initiate nor accept ECN. 1 Always request ECN on outgoing connection attempts. - 2 Enable ECN when requested by incomming connections + 2 Enable ECN when requested by incoming connections but do not request ECN on outgoing connections. Default: 2 @@ -503,7 +503,7 @@ tcp_fastopen - INTEGER tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 255. Default value - is 6, which corresponds to 63seconds till the last restransmission + is 6, which corresponds to 63seconds till the last retransmission with the current initial RTO of 1second. With this the final timeout for an active TCP connection attempt will happen after 127seconds. @@ -1536,7 +1536,7 @@ cookie_hmac_alg - STRING * sha1 * none Ability to assign md5 or sha1 as the selected alg is predicated on the - configuarion of those algorithms at build time (CONFIG_CRYPTO_MD5 and + configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and CONFIG_CRYPTO_SHA1). Default: Dependent on configuration. MD5 if available, else SHA1 if @@ -1554,7 +1554,7 @@ rcvbuf_policy - INTEGER blocking. 1: rcvbuf space is per association - 0: recbuf space is per socket + 0: rcvbuf space is per socket Default: 0 -- cgit v1.2.3