Commit Graph

3548 Commits

Author SHA1 Message Date
Catalin(ux) M. BOIE 6ea2edb3b6 IPv6 NAT: Do not drop DNATed 6to4/6rd packets
[ Upstream commit 7df37ff33dc122f7bd0614d707939fe84322d264 ]

When a router is doing DNAT for 6to4/6rd packets the latest
anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
6to4 and 6rd") will drop them because the IPv6 address embedded does
not match the IPv4 destination. This patch will allow them to pass by
testing if we have an address that matches on 6to4/6rd interface.  I
have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
Also, log the dropped packets (with rate limit).

Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:30 -07:00
Hannes Frederic Sowa f5a30bc1d6 ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
[ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ]

In the following scenario the socket is corked:
If the first UDP packet is larger then the mtu we try to append it to the
write queue via ip6_ufo_append_data. A following packet, which is smaller
than the mtu would be appended to the already queued up gso-skb via
plain ip6_append_data. This causes random memory corruptions.

In ip6_ufo_append_data we also have to be careful to not queue up the
same skb multiple times. So setup the gso frame only when no first skb
is available.

This also fixes a shortcoming where we add the current packet's length to
cork->length but return early because of a packet > mtu with dontfrag set
(instead of sutracting it again).

Found with trinity.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:30 -07:00
Ansis Atteka 68a9e70789 ip: generate unique IP identificator if local fragmentation is allowed
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ]

If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.

For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:30 -07:00
Ding Zhi fa42a01cb0 ip6_tunnels: raddr and laddr are inverted in nl msg
[ Upstream commit 0d2ede929f61783aebfb9228e4d32a0546ee4d23 ]

IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.

Introduced by c075b13098 (ip6tnl: advertise tunnel param via rtnl).

Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:29 -07:00
Daniel Borkmann 45d268c6b8 net: fib: fib6_add: fix potential NULL pointer dereference
[ Upstream commit ae7b4e1f213aa659aedf9c6ecad0bf5f0476e1e2 ]

When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
with an error in fn = fib6_add_1(), then error codes are encoded into
the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
write the error code into err and jump to out, hence enter the if(err)
condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:

  if (pn != fn && pn->leaf == rt)
    ...
  if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
    ...

Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
evaluates to true and causes a NULL-pointer dereference on further
checks on pn. Fix it, by setting both NULL in error case, so that
pn != fn already evaluates to false and no further dereference
takes place.

This was first correctly implemented in 4a287eba2 ("IPv6 routing,
NLM_F_* flag support: REPLACE and EXCL flags support, warn about
missing CREATE flag"), but the bug got later on introduced by
188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()").

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>
Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:28 -07:00
Jiri Pirko 711b8c9684 ipv6/exthdrs: accept tlv which includes only padding
[ Upstream commit 8112b1fe071be01a28a774ed55909e6f4b29712d ]

In rfc4942 and rfc2460 I cannot find anything which would implicate to
drop packets which have only padding in tlv.

Current behaviour breaks TAHI Test v6LC.1.2.6.

Problem was intruduced in:
9b905fe684 "ipv6/exthdrs: strict Pad1 and PadN check"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:28 -07:00
Daniel Borkmann a22eb149b1 net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv
[ Upstream commit 3a1c756590633c0e86df606e5c618c190926a0df ]

In tcp_v6_do_rcv() code, when processing pkt options, we soley work
on our skb clone opt_skb that we've created earlier before entering
tcp_rcv_established() on our way. However, only in condition ...

  if (np->rxopt.bits.rxtclass)
    np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb));

... we work on skb itself. As we extract every other information out
of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can
already be released by tcp_rcv_established() earlier on. When we try
to access it in ipv6_hdr(), we will dereference freed skb.

[ Bug added by commit 4c507d2897 ("net: implement IP_RECVTOS for
  IP_PKTOPTIONS") ]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:56 -07:00
Hannes Frederic Sowa ef1f8bcdc2 ipv6: fix null pointer dereference in __ip6addrlbl_add
[ Upstream commit 639739b5e609a5074839bb22fc061b37baa06269 ]

Commit b67bfe0d42 ("hlist: drop
the node parameter from iterators") changed the behavior of
hlist_for_each_entry_safe to leave the p argument NULL.

Fix this up by tracking the last argument.

Reported-by: Michele Baldessari <michele@acksyn.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Michele Baldessari <michele@acksyn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:56 -07:00
Jiri Bohac 2aae409672 ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO
[ Upstream commit 61e76b178dbe7145e8d6afa84bb4ccea71918994 ]

RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination
unreachable) messages:
        5 - Source address failed ingress/egress policy
	6 - Reject route to destination

Now they are treated as protocol error and icmpv6_err_convert() converts them
to EPROTO.

RFC 4443 says:
	"Codes 5 and 6 are more informative subsets of code 1."

Treat codes 5 and 6 as code 1 (EACCES)

Btw, connect() returning -EPROTO confuses firefox, so that fallback to
other/IPv4 addresses does not work:
https://bugzilla.mozilla.org/show_bug.cgi?id=910773

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:56 -07:00
Thomas Graf 4b7ead801d ipv6: Don't depend on per socket memory for neighbour discovery messages
[ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ]

Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.

If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.

The number of neighbour discovery messages sent is very limited,
use of alloc_skb() bypasses the socket wmem buffer size enforcement
while the manual call to skb_set_owner_w() maintains the socket
reference needed for the IPv6 output path.

This patch has orginally been posted by Eric Dumazet in a modified
form.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:56 -07:00
Hannes Frederic Sowa a829a28873 ipv6: drop packets with multiple fragmentation headers
[ Upstream commit f46078cfcd77fa5165bf849f5e568a7ac5fa569c ]

It is not allowed for an ipv6 packet to contain multiple fragmentation
headers. So discard packets which were already reassembled by
fragmentation logic and send back a parameter problem icmp.

The updates for RFC 6980 will come in later, I have to do a bit more
research here.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:55 -07:00
Hannes Frederic Sowa ad558a2970 ipv6: remove max_addresses check from ipv6_create_tempaddr
[ Upstream commit 4b08a8f1bd8cb4541c93ec170027b4d0782dab52 ]

Because of the max_addresses check attackers were able to disable privacy
extensions on an interface by creating enough autoconfigured addresses:

<http://seclists.org/oss-sec/2012/q4/292>

But the check is not actually needed: max_addresses protects the
kernel to install too many ipv6 addresses on an interface and guards
addrconf_prefix_rcv to install further addresses as soon as this limit
is reached. We only generate temporary addresses in direct response of
a new address showing up. As soon as we filled up the maximum number of
addresses of an interface, we stop installing more addresses and thus
also stop generating more temp addresses.

Even if the attacker tries to generate a lot of temporary addresses
by announcing a prefix and removing it again (lifetime == 0) we won't
install more temp addresses, because the temporary addresses do count
to the maximum number of addresses, thus we would stop installing new
autoconfigured addresses when the limit is reached.

This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
possible).

Thanks to Ding Tianhong to bring this topic up again.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: George Kargiotakis <kargig@void.gr>
Cc: P J P <ppandit@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:55 -07:00
Hannes Frederic Sowa 24e8ac721e ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match
[ Upstream commit 3e3be275851bc6fc90bfdcd732cd95563acd982b ]

In case a subtree did not match we currently stop backtracking and return
NULL (root table from fib_lookup). This could yield in invalid routing
table lookups when using subtrees.

Instead continue to backtrack until a valid subtree or node is found
and return this match.

Also remove unneeded NULL check.

Reported-by: Teco Boot <teco@inf-net.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Lamparter <equinox@diac24.net>
Cc: <boutier@pps.univ-paris-diderot.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14 06:54:54 -07:00
Jaime Lopez c433362e33 ipv6: Enable new mode proxy_ndp == 2
This new mode allows Neighbor discovery packets to be sent to userspace
without regards to the state of the forwarding setting. Without this, NDP
packets addressed to the host are still received, but those for other
addresses are not.

Enabling this mode allows NDP proxying to be performed from userspace.

Change-Id: I69b7a7c0c42e3253c42d6f2e163c0ce1d848aed6
Signed-off-by: Jaime Lopez <jaimel@codeaurora.org>
2013-09-04 14:52:42 -07:00
Abhijeet Dharmapurikar f7b8e3517b msm_rmnet: merge support for RAWIP msm_rmnet device
Add to support for msm_rmnet device using ARPHRD_RAWIP.

Change-Id: Ie1e5433f440b26b644cccb18083ef325129f7942
Acked-by: Andrew Richardson <randrew@qualcomm.com>
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2013-09-04 14:52:42 -07:00
Stephen Boyd 38d8910730 Merge branch 'qandroid-3.10' into msm-3.10
* qandroid-3.10: (636 commits)
  netfilter: xt_qtaguid: Protect iface list access with necessary lock
  HID: magicmouse: Fix build warning
  USB: gadget: mtp: Fix OUT endpoint request length usage in read
  USB: gadget: f_mtp: Fix using tx buffer pointer
  msm: Fix race condition in domain lookup
  msm: Add null-pointer checks for domains
  base: sync: increase size of sync_timeline name
  USB: gadget: mtp: Add module parameters for Tx transfer length
  msm: iommu: Lock the genpool allocation
  gpu: ion: fix page offset in dma_buf_kmap()
  gpu: ion: Fix bug in ion_system_heap map_user
  gpu: ion: Only map as much of the vma as the user requested
  gpu: ion: use vmalloc to allocate page array to map kernel
  gpu: ion: Remove dead comments
  gpu: ion: Minimize allocation fallback delay
  mmc: sd: Set the card removed if card detect fails
  gpu: ion: don't fault in individual pages for the CP heap
  gpu: ion: do not ask for compound pages in system heap
  gpu: ion: Modify the system heap to try to allocate large/huge pages
  gpu: ion: Set the dma_address of the sg list at alloc time
  ...

Conflicts:
	arch/arm/Kconfig
	arch/arm/include/asm/hardware/cache-l2x0.h
	arch/arm/mm/cache-l2x0.c
	drivers/mmc/card/block.c
	drivers/usb/gadget/udc-core.c
2013-09-04 14:46:18 -07:00
Hannes Frederic Sowa 69af51cb5c ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
[ Upstream commit 905a6f96a1b18e490a75f810d733ced93c39b0e5 ]

Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
which leads to a panic (from Srivatsa S. Bhat):

BUG: unable to handle kernel paging request at ffff882018552020
IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
+ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: netns cleanup_net
task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
Stack:
 ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
Call Trace:
 [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
 [<ffffffff815bdecb>] inet_release+0xfb/0x220
 [<ffffffff815bddf2>] ? inet_release+0x22/0x220
 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
 [<ffffffff8151c1d9>] sock_release+0x29/0xa0
 [<ffffffff81525520>] sk_release_kernel+0x30/0x70
 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
 [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff81076390>] worker_thread+0x120/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da2e>] kthread+0xee/0x100
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
 RSP <ffff881039367bd8>
CR2: ffff882018552020

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11 18:35:25 -07:00
Hannes Frederic Sowa d6245ef7c3 ipv6: only static routes qualify for equal cost multipathing
[ Upstream commit 307f2fb95e9b96b3577916e73d92e104f8f26494 ]

Static routes in this case are non-expiring routes which did not get
configured by autoconf or by icmpv6 redirects.

To make sure we actually get an ecmp route while searching for the first
one in this fib6_node's leafs, also make sure it matches the ecmp route
assumptions.

v2:
a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
   already ensures that this route, even if added again without
   RTF_EXPIRES (in case of a RA announcement with infinite timeout),
   does not cause the rt6i_nsiblings logic to go wrong if a later RA
   updates the expiration time later.

v3:
a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
   because an pmtu event could update the RTF_EXPIRES flag and we would
   not count this route, if another route joins this set. We now filter
   only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
   don't get changed after rt6_info construction.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:30:02 -07:00
Hannes Frederic Sowa b4f1489ed5 ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF
[ Upstream commit afc154e978de1eb11c555bc8bcec1552f75ebc43 ]

This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52
("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
information are available").

Since the removal of rt->n in rt6_info we can end up with a dst ==
NULL in rt6_check_neigh. In case the kernel is not compiled with
CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
NUD state but we must not avoid doing round robin selection on routes
with the same target. So introduce and pass down a boolean ``do_rr'' to
indicate when we should update rt->rr_ptr. As soon as no route is valid
we do backtracking and do a lookup on a higher level in the fib trie.

v2:
a) Improved rt6_check_neigh logic (no need to create neighbour there)
   and documented return values.

v3:
a) Introduce enum rt6_nud_state to get rid of the magic numbers
   (thanks to David Miller).
b) Update and shorten commit message a bit to actualy reflect
   the source.

Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:30:00 -07:00
Hannes Frederic Sowa a025e28ad8 ipv6: in case of link failure remove route directly instead of letting it expire
[ Upstream commit 1eb4f758286884e7566627164bca4c4a16952a83 ]

We could end up expiring a route which is part of an ecmp route set. Doing
so would invalidate the rt->rt6i_nsiblings calculations and could provoke
the following panic:

[   80.144667] ------------[ cut here ]------------
[   80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
[   80.145172] invalid opcode: 0000 [#1] SMP
[   80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
+snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
[   80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
[   80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
[   80.145172] RIP: 0010:[<ffffffff815f3b5d>]  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
[   80.145172] RSP: 0018:ffff880118771798  EFLAGS: 00010202
[   80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
[   80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
[   80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
[   80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
[   80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
[   80.145172] FS:  00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
[   80.145172] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
[   80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   80.145172] Stack:
[   80.145172]  0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
[   80.145172]  0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
[   80.145172]  0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
[   80.145172] Call Trace:
[   80.145172]  [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
[   80.145172]  [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
[   80.145172]  [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40
[   80.145172]  [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
[   80.145172]  [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
[   80.145172]  [<ffffffff81616077>] fib6_rule_action+0xd7/0x210
[   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[   80.145172]  [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140
[   80.145172]  [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80
[   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[   80.145172]  [<ffffffff815edea3>] ip6_route_output+0x73/0xb0
[   80.145172]  [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
[   80.145172]  [<ffffffff813007b1>] ? list_del+0x11/0x40
[   80.145172]  [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
[   80.145172]  [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
[   80.145172]  [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
[   80.145172]  [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
[   80.145172]  [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
[   80.145172]  [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
[   80.145172]  [<ffffffff81524a68>] SYSC_sendto+0x128/0x180
[   80.145172]  [<ffffffff8109825c>] ? update_curr+0xec/0x170
[   80.145172]  [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
[   80.145172]  [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
[   80.145172]  [<ffffffff8152509e>] SyS_sendto+0xe/0x10
[   80.145172]  [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
[   80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
[   80.145172] RIP  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
[   80.145172]  RSP <ffff880118771798>
[   80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
[   80.390154] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:58 -07:00
Hannes Frederic Sowa c3a5491228 ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
[ Upstream commit 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ]

After the removal of rt->n we do not create a neighbour entry at route
insertion time (rt6_bind_neighbour is gone). As long as no neighbour is
created because of "useful traffic" we skip this routing entry because
rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and
thus returns false.

This change was introduced by commit
887c95cc1d ("ipv6: Complete neighbour
entry removal from dst_entry.")

To quote RFC4191:
"If the host has no information about the router's reachability, then
the host assumes the router is reachable."

and also:
"A host MUST NOT probe a router's reachability in the absence of useful
traffic that the host would have sent to the router if it were reachable."

So, just assume the router is reachable and let's rt6_probe do the
rest. We don't need to create a neighbour on route insertion time.

If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support)
a neighbour is only valid if its nud_state is NUD_VALID. I did not find
any references that we should probe the router on route insertion time
via the other RFCs. So skip this route in that case.

v2:
a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov)

Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:50 -07:00
Hannes Frederic Sowa 7852c5bf36 ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size
[ Upstream commit 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be ]

If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track
of this when appending the second frame on a corked socket. This results
in the following splat:

[37598.993962] ------------[ cut here ]------------
[37598.994008] kernel BUG at net/core/skbuff.c:2064!
[37598.994008] invalid opcode: 0000 [#1] SMP
[37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
+nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi
+scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm
[37598.994008]  snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc
+dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
[37598.994008] CPU 0
[37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG
[37598.994008] RIP: 0010:[<ffffffff815443a5>]  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008] RSP: 0018:ffff88003670da18  EFLAGS: 00010202
[37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0
[37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00
[37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040
[37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8
[37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000
[37598.994008] FS:  00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[37598.994008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0
[37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0)
[37598.994008] Stack:
[37598.994008]  ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8
[37598.994008]  ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200
[37598.994008]  0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4
[37598.994008] Call Trace:
[37598.994008]  [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0
[37598.994008]  [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0
[37598.994008]  [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40
[37598.994008]  [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10
[37598.994008]  [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90
[37598.994008]  [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0
[37598.994008]  [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30
[37598.994008]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
[37598.994008]  [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
[37598.994008]  [<ffffffff8153d97d>] sys_sendto+0x12d/0x180
[37598.994008]  [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0
[37598.994008]  [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240
[37598.994008]  [<ffffffff8166a7e7>] tracesys+0xdd/0xe2
[37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48
[37598.994008] RIP  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008]  RSP <ffff88003670da18>
[37599.007323] ---[ end trace d69f6a17f8ac8eee ]---

While there, also check if path mtu discovery is activated for this
socket. The logic was adapted from ip6_append_data when first writing
on the corked socket.

This bug was introduced with commit
0c1833797a ("ipv6: fix incorrect ipsec
fragment").

v2:
a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE.
b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao
   feng, thanks!).
c) Change mtu to unsigned int, else we get a warning about
   non-matching types because of the min()-macro type-check.

Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:50 -07:00
Hannes Frederic Sowa 07243c3d96 ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data
[ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ]

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Dave Jones <davej@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:49 -07:00
Amerigo Wang 36bddbad50 ipv6,mcast: always hold idev->lock before mca_lock
[ Upstream commit 8965779d2c0e6ab246c82a405236b1fb2adae6b2, with
  some bits from commit b7b1bfce0bb68bd8f6e62a28295922785cc63781
  ("ipv6: split duplicate address detection and router solicitation timer")
  to get the __ipv6_get_lladdr() used by this patch. ]

dingtianhong reported the following deadlock detected by lockdep:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.4.24.05-0.1-default #1 Not tainted
 -------------------------------------------------------
 ksoftirqd/0/3 is trying to acquire lock:
  (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120

 but task is already holding lock:
  (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&mc->mca_lock){+.+...}:
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60
        [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120
        [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60
        [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80
        [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0
        [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80
        [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20
        [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60
        [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80
        [<ffffffff813d9360>] dev_change_flags+0x40/0x70
        [<ffffffff813ea627>] do_setlink+0x237/0x8a0
        [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600
        [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310
        [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0
        [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40
        [<ffffffff81403e20>] netlink_unicast+0x140/0x180
        [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380
        [<ffffffff813c4252>] sock_sendmsg+0x112/0x130
        [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460
        [<ffffffff813c5544>] sys_sendmsg+0x44/0x70
        [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b

 -> #0 (&ndev->lock){+.+...}:
        [<ffffffff810a798e>] check_prev_add+0x3de/0x440
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f6c82>] rt_read_lock+0x42/0x60
        [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120
        [<ffffffff8149b036>] mld_newpack+0xb6/0x160
        [<ffffffff8149b18b>] add_grhead+0xab/0xc0
        [<ffffffff8149d03b>] add_grec+0x3ab/0x460
        [<ffffffff8149d14a>] mld_send_report+0x5a/0x150
        [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0
        [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0
        [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0
        [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0
        [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0
        [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210
        [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0
        [<ffffffff8106c7de>] kthread+0xae/0xc0
        [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10

actually we can just hold idev->lock before taking pmc->mca_lock,
and avoid taking idev->lock again when iterating idev->addr_list,
since the upper callers of mld_newpack() already take
read_lock_bh(&idev->lock).

Reported-by: dingtianhong <dingtianhong@huawei.com>
Cc: dingtianhong <dingtianhong@huawei.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Chen Weilong <chenweilong@huawei.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:49 -07:00
Hannes Frederic Sowa ce08aa0421 ipv6: only apply anti-spoofing checks to not-pointopoint tunnels
[ Upstream commit 5c29fb12e8fb8a8105ea048cb160fd79a85a52bb ]

Because of commit 218774dc34 ("ipv6: add
anti-spoofing checks for 6to4 and 6rd") the sit driver dropped packets
for 2002::/16 destinations and sources even when configured to work as a
tunnel with fixed endpoint. We may only apply the 6rd/6to4 anti-spoofing
checks if the device is not in pointopoint mode.

This was an oversight from me in the above commit, sorry.  Thanks to
Roman Mamedov for reporting this!

Reported-by: Roman Mamedov <rm@romanrm.ru>
Cc: David Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28 16:29:41 -07:00
Tianyi Gou 55bc24193c net/ipv6/addrconf: IPv6 tethering enhancement
Added new procfs flag to toggle the automatic addition of prefix
routes on a per device basis. The new flag is accept_ra_prefix_route.
Defaults to 1 as to not break existing behavior.

Change-Id: If25493890c7531c27f5b2c4855afebbbbf5d072a
CRs-Fixed: 435320
Acked-by: Harout S. Hedeshian <harouth@qti.qualcomm.com>
Signed-off-by: Tianyi Gou <tgou@codeaurora.org>
2013-07-08 05:55:06 -07:00
JP Abgrall ef08cfe9ce netfilter: ipv6: fix crash caused by ipv6_find_hdr()
When calling:
    ipv6_find_hdr(skb, &thoff, -1, NULL)
on a fragmented packet, thoff would be left with a random
value causing callers to read random memory offsets with:
    skb_header_pointer(skb, thoff, ...)

Now we force ipv6_find_hdr() to return a failure in this case.
Calling:
  ipv6_find_hdr(skb, &thoff, -1, &fragoff)
will set fragoff as expected, and not return a failure.

Change-Id: Ib474e8a4267dd2b300feca325811330329684a88
Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01 13:40:36 -07:00
JP Abgrall 6f489c42a9 netfilter: have ip*t REJECT set the sock err when an icmp is to be sent
Allow the REJECT --reject-with icmp*blabla to also set the matching error
locally on the socket affected by the reject.
This allows the process to see an error almost as if it received it
via ICMP.
It avoids the local process who's ingress packet is rejected to have to
wait for a pseudo-eternity until some timeout kicks in.

Ideally, this should be enabled with a new iptables flag similar to
   --reject-with-sock-err
For now it is enabled with CONFIG_IP*_NF_TARGET_REJECT_SKERR option.

Change-Id: I649a4fd5940029ec0b3233e5abb205da6984891e
Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01 13:40:35 -07:00
Chia-chi Yeh efa0d7175f net: Replace AID_NET_RAW checks with capable(CAP_NET_RAW).
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2013-07-01 13:40:27 -07:00
Robert Love 4b0158841f net: socket ioctl to reset connections matching local address
Introduce a new socket ioctl, SIOCKILLADDR, that nukes all sockets
bound to the same local address. This is useful in situations with
dynamic IPs, to kill stuck connections.

Signed-off-by: Brian Swetland <swetland@google.com>

net: fix tcp_v4_nuke_addr

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix a spinlock recursion bug in tcp_v4_nuke.

We can't hold the lock while calling to tcp_done(), so we drop
it before calling. We then have to start at the top of the chain again.

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix race in tcp_v4_nuke_addr().

To fix a recursive deadlock in 2.6.29, we stopped holding the hash table lock
across tcp_done() calls. This fixed the deadlock, but introduced a race where
the socket could die or change state.

Fix: Before unlocking the hash table, we grab a reference to the socket. We
can then unlock the hash table without risk of the socket going away. We then
lock the socket, which is safe because it is pinned. We can then call
tcp_done() without recursive deadlock and without race. Upon return, we unlock
the socket and then unpin it, killing it.

Change-Id: Idcdae072b48238b01bdbc8823b60310f1976e045
Signed-off-by: Robert Love <rlove@google.com>
Acked-by: Dima Zavin <dima@android.com>

ipv4: disable bottom halves around call to tcp_done().

Signed-off-by: Robert Love <rlove@google.com>
Signed-off-by: Colin Cross <ccross@android.com>

ipv4: Move sk_error_report inside bh_lock_sock in tcp_v4_nuke_addr

When sk_error_report is called, it wakes up the user-space thread, which then
calls tcp_close.  When the tcp_close is interrupted by the tcp_v4_nuke_addr
ioctl thread running tcp_done, it leaks 392 bytes and triggers a WARN_ON.

This patch moves the call to sk_error_report inside the bh_lock_sock, which
matches the locking used in tcp_v4_err.

Signed-off-by: Colin Cross <ccross@android.com>
2013-07-01 13:40:20 -07:00
Robert Love fd64bbf28e Paranoid network.
With CONFIG_ANDROID_PARANOID_NETWORK, require specific uids/gids to instantiate
network sockets.

Signed-off-by: Robert Love <rlove@google.com>

paranoid networking: Use in_egroup_p() to check group membership

The previous group_search() caused trouble for partners with module builds.
in_egroup_p() is also cleaner.

Signed-off-by: Nick Pelly <npelly@google.com>

Fix 2.6.29 build.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

net: Fix compilation of the IPv6 module

Fix compilation of the IPv6 module -- current->euid does not exist anymore,
current_euid() is what needs to be used.

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
2013-07-01 13:40:20 -07:00
Eric Dumazet a963a37d38 ipv6: ip6_sk_dst_check() must not assume ipv6 dst
It's possible to use AF_INET6 sockets and to connect to an IPv4
destination. After this, socket dst cache is a pointer to a rtable,
not rt6_info.

ip6_sk_dst_check() should check the socket dst cache is IPv6, or else
various corruptions/crashes can happen.

Dave Jones can reproduce immediate crash with
trinity -q -l off -n -c sendmsg -c connect

With help from Hannes Frederic Sowa

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-26 15:13:47 -07:00
Hannes Frederic Sowa dc8482926e ipv6: check return value of ipv6_get_lladdr
We should check the return value of ipv6_get_lladdr in inet6_set_iftoken.

A possible situation, which could leave ll_addr unassigned is, when
the user removed her link-local address but a global scoped address was
already set. In this case the interface would still be IF_READY and not
dead. In that case the RS source address is some value from the stack.

v2: Daniel Borkmann noted a small indent inconstancy; no semantic
changes.

Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:27:28 -07:00
YOSHIFUJI Hideaki / 吉藤英明 ab4eb3537e ipv6: Process unicast packet with Router Alert by checking flag in skb.
Router Alert option is marked in skb.
Previously, IP6CB(skb)->ra was set to positive value for such packets.
Since commit dd3332bf ("ipv6: Store Router Alert option in IP6CB
directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and
the value of Router Alert option (in network byte order) is set
to IP6CB(skb)->ra for such packets.

Multicast forwarding path uses that flag and value, but unicast
forwarding path does not use the flag and misuses IP6CB(skb)->ra
value.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 14:47:22 -07:00
David S. Miller a3d9dd89b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains five fixes for Netfilter/IPVS, they are:

* A skb leak fix in fragmentation handling in case that helpers are in place,
  it occurs since the IPV6 NAT infrastructure, from Phil Oester.

* Fix SCTP port mangling in ICMP packets for IPVS, from Julian Anastasov.

* Fix event delivery in ctnetlink regarding the new connlabel infrastructure,
  from Florian Westphal.

* Fix mangling in the SIP NAT helper, from Balazs Peter Odor.

* Fix crash in ipt_ULOG introduced while adding netnamespace support,
  from Gao Feng.

I'll take care of passing several of these patches to -stable once they hit
Linus' tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24 12:45:24 -07:00
Phil Oester 142dcdd3c2 netfilter: nf_conntrack_ipv6: Plug sk_buff leak in fragment handling
In commit 4cdd3408 ("netfilter: nf_conntrack_ipv6: improve fragmentation
handling"), an sk_buff leak was introduced when dealing with reassembled
packets by grabbing a reference to the original skb instead of the
reassembled skb.  At this point, the leak only impacted conntracks with an
associated helper.

In commit 58a317f1 ("netfilter: ipv6: add IPv6 NAT support"), the bug was
expanded to include all reassembled packets with unconfirmed conntracks.

Fix this by grabbing a reference to the proper reassembled skb.  This
closes netfilter bugzilla #823.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 12:01:24 +02:00
Gao feng a881ae1f62 ipv6: don't call addrconf_dst_alloc again when enable lo
If we disable all of the net interfaces, and enable
un-lo interface before lo interface, we already allocated
the addrconf dst in ipv6_add_addr. So we shouldn't allocate
it again when we enable lo interface.

Otherwise the message below will be triggered.
unregister_netdevice: waiting for sit1 to become free. Usage count = 1

This problem is introduced by commit 25fb6ca4ed
"net IPv6 : Fix broken IPv6 routing table after loopback down-up"

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:04:50 -07:00
Matthias Schiffer 33be081a81 ipv6: ndisc: fix ndisc_send_redirect writing to the wrong skb
Since some refactoring in 5f5a011, ndisc_send_redirect called
ndisc_fill_redirect_hdr_option on the wrong skb, leading to data corruption or
in the worst case a panic when the skb_put failed.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 22:59:12 -07:00
Gao feng 534c877928 ipv6: assign rt6_info to inet6_ifaddr in init_loopback
Commit 25fb6ca4ed
"net IPv6 : Fix broken IPv6 routing table after loopback down-up"
forgot to assign rt6_info to the inet6_ifaddr.
When disable the net device, the rt6_info which allocated
in init_loopback will not be destroied in __ipv6_ifa_notify.

This will trigger the waring message below
[23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1

Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 16:57:41 -07:00
Pravin B Shelar 1e2bd517c1 udp6: Fix udp fragmentation for tunnel traffic.
udp6 over GRE tunnel does not work after to GRE tso changes. GRE
tso handler passes inner packet but keeps track of outer header
start in SKB_GSO_CB(skb)->mac_offset.  udp6 fragment need to
take care of outer header, which start at the mac_offset, while
adding fragment header.
This bug is introduced by commit 68c3316311 (GRE: Add TCP
segmentation offload for GRE).

Reported-by: Dmitry Kravkov <dkravkov@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:06:07 -07:00
Nicolas Dichtel fda3f402f4 snmp6: remove IPSTATS_MIB_CSUMERRORS
This stat is not relevant in IPv6, there is no checksum in IPv6 header.
Just leave a comment to explain the hole.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:26:49 -07:00
Florian Westphal 2a7851bffb netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:

[ ip6tables -m addrtype ]
When I tried to use in the nat/PREROUTING it messes up the
routing cache even if the rule didn't matched at all.
[..]
If I remove the --limit-iface-in from the non-working scenario, so just
use the -m addrtype --dst-type LOCAL it works!

This happens when LOCAL type matching is requested with --limit-iface-in,
and the default ipv6 route is via the interface the packet we test
arrived on.

Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
creates an unwanted cached entry, and the packet won't make it to the
real/expected destination.

Silently ignoring --limit-iface-in makes the routing work but it breaks
rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
match if the dst address is configured on the incoming interface;
without --limit-iface-in it will match if the address is reachable
via lo).

The test should call ipv6_chk_addr() instead.  However, this would add
a link-time dependency on ipv6.

There are two possible solutions:

1) Revert the commit that moved ipt_addrtype to xt_addrtype,
   and put ipv6 specific code into ip6t_addrtype.
2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.

While the former might seem preferable, Pablo pointed out that there
are more xt modules with link-time dependeny issues regarding ipv6,
so lets go for 2).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 11:58:55 +02:00
Eric Dumazet 284041ef21 ipv6: fix possible crashes in ip6_cork_release()
commit 0178b695fd ("ipv6: Copy cork options in ip6_append_data")
added some code duplication and bad error recovery, leading to potential
crash in ip6_cork_release() as kfree() could be called with garbage.

use kzalloc() to make sure this wont happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
2013-05-18 12:55:45 -07:00
Cong Wang 84c4a9dfbf xfrm6: release dev before returning error
We forget to call dev_put() on error path in xfrm6_fill_dst(),
its caller doesn't handle this.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 17:40:15 -07:00
Amerigo Wang 5dbd506843 ipv6,gre: do not leak info to user-space
There is a hole in struct ip6_tnl_parm2, so we have to
zero the struct on stack before copying it to user-space.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 17:40:14 -07:00
Eric Dumazet f77d602124 ipv6: do not clear pinet6 field
We have seen multiple NULL dereferences in __inet6_lookup_established()

After analysis, I found that inet6_sk() could be NULL while the
check for sk_family == AF_INET6 was true.

Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
and TCP stacks.

Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
table, we no longer can clear pinet6 field.

This patch extends logic used in commit fcbdf09d96
("net: fix nulls list corruptions in sk_prot_alloc")

TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
to make sure we do not clear pinet6 field.

At socket clone phase, we do not really care, as cloning the parent (non
NULL) pinet6 is not adding a fatal race.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 16:26:38 -07:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
David Howells a8ca16ea7b proc: Supply a function to remove a proc entry by PDE
Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer.  This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Grant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:46 -04:00
Eric Dumazet 6a5dc9e598 net: Add MIB counters for checksum errors
Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.

$ nstat -a | grep  Csum
IcmpInCsumErrors                72                 0.0
TcpInCsumErrors                 382                0.0
UdpInCsumErrors                 463221             0.0
Icmp6InCsumErrors               75                 0.0
Udp6InCsumErrors                173442             0.0
IpExtInCsumErrors               10884              0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 15:14:03 -04:00
Pravin B Shelar 5f5624cf15 ipv6: Kill ipv6 dependency of icmpv6_send().
Following patch adds icmp-registration module for ipv6.  It allows
ipv6 protocol to register icmp_sender which is used for sending
ipv6 icmp msgs.  This extra layer allows us to kill ipv6 dependency
for sending icmp packets.

This patch also fixes ip_tunnel compilation problem when ip_tunnel
is statically compiled in kernel but ipv6 is module

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 13:54:36 -04:00
David S. Miller 6e0895c2ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	drivers/net/ethernet/intel/igb/igb_main.c
	drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
	include/net/scm.h
	net/batman-adv/routing.c
	net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 20:32:51 -04:00
David S. Miller 95a06161e6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains a small batch of Netfilter
updates for your net-next tree, they are:

* Three patches that provide more accurate error reporting to
  user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing
  code and NAT, from Patrick McHardy.

* Update copyright statements in Netfilter filters of
  Patrick McHardy, from himself.

* Add Kconfig dependency on the raw/mangle tables to the
  rpfilter, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:55:29 -04:00
David S. Miller fd7fc25328 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
If time allows, please consider pulling the following patchset contains two
late Netfilter fixes, they are:

* Skip broadcast/multicast locally generated traffic in the rpfilter,
  (closes netfilter bugzilla #814), from Florian Westphal.

* Fix missing elements in the listing of ipset bitmap ip,mac set
  type with timeout support enabled, from Jozsef Kadlecsik.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:24:47 -04:00
Florian Westphal d37d696804 netfilter: xt_rpfilter: depend on raw or mangle table
rpfilter is only valid in raw/mangle PREROUTING, i.e.
RPFILTER=y|m is useless without raw or mangle table support.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-19 00:22:55 +02:00
Florian Westphal f83a7ea207 netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too
Alex Efros reported rpfilter module doesn't match following packets:
IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
(netfilter bugzilla #814).

Problem is that network stack arranges for the locally generated broadcasts
to appear on the interface they were sent out, so the IFF_LOOPBACK check
doesn't trigger.

As -m rpfilter is restricted to PREROUTING, we can check for existing
rtable instead, it catches locally-generated broad/multicast case, too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-19 00:11:59 +02:00
Patrick McHardy f229f6ce48 netfilter: add my copyright statements
Add copyright statements to all netfilter files which have had significant
changes done by myself in the past.

Some notes:

- nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
  Core Team when it got split out of nf_conntrack_core.c. The copyrights
  even state a date which lies six years before it was written. It was
  written in 2005 by Harald and myself.

- net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
  statements. I've added the copyright statement from net/netfilter/core.c,
  where this code originated

- for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
  it to give the wrong impression

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-18 20:27:55 +02:00
Eric Dumazet 97599dc792 net: drop dst before queueing fragments
Commit 4a94445c9a (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.

Commit 64f3b9e203 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.

Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.

Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)

When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.

Use same logic in IPv6, as there is no need to hold dst references.

Reported-by: Tom Parkin <tparkin@katalix.com>
Tested-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 01:15:29 -04:00
Daniel Borkmann bf84a01063 net: sock: make sock_tx_timestamp void
Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:41:49 -04:00
Cong Wang f88c91ddba ipv6: statically link register_inet6addr_notifier()
Tomas reported the following build error:

net/built-in.o: In function `ieee80211_unregister_hw':
(.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier'
net/built-in.o: In function `ieee80211_register_hw':
(.text+0x10f610): undefined reference to `register_inet6addr_notifier'
make: *** [vmlinux] Error 1

when built IPv6 as a module.

So we have to statically link these symbols.

Reported-by: Tomas Melin <tomas.melin@iki.fi>
Cc: Tomas Melin <tomas.melin@iki.fi>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:24:17 -04:00
David S. Miller 16e3d9648a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1)  Allow to avoid copying DSCP during encapsulation
    by setting a SA flag. From Nicolas Dichtel.

2) Constify the netlink dispatch table, no need to modify it
   at runtime. From Mathias Krause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:14:37 -04:00
Al Viro d9dda78bad procfs: new helper - PDE_DATA(inode)
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data.  Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:32 -04:00
Daniel Borkmann 617fe29d45 net: ipv6: only invalidate previously tokenized addresses
Instead of invalidating all IPv6 addresses with global scope
when one decides to use IPv6 tokens, we should only invalidate
previous tokens and leave the rest intact until they expire
eventually (or are intact forever). For doing this less greedy
approach, we're adding a bool at the end of inet6_ifaddr structure
instead, for two reasons: i) per-inet6_ifaddr flag space is
already used up, making it wider might not be a good idea,
since ii) also we do not necessarily need to export this
information into user space.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Daniel Borkmann fc403832f7 net: ipv6: also allow token to be set when device not ready
When we set the iftoken in inet6_set_iftoken(), we return -EINVAL
when the device does not have flag IF_READY. This is however not
necessary and rather an artificial usability barrier, since we
simply can set the token despite that, and in case the device is
ready, we just send out our rs, otherwise ifup et al. will do
this for us anyway.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Daniel Borkmann 914faa147b net: ipv6: minor: use in6addr_any in token init
Since we check for !ipv6_addr_any(&in6_dev->token) in
addrconf_prefix_rcv(), make the token initialization on
device setup more intuitive by using in6addr_any as an
initializer.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Daniel Borkmann f53adae4ea net: ipv6: add tokenized interface identifier support
This patch adds support for IPv6 tokenized IIDs, that allow
for administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix from
Router Advertisements. It is currently in draft status.

  The primary target for such support is server platforms
  where addresses are usually manually configured, rather
  than using DHCPv6 or SLAAC. By using tokenised identifiers,
  hosts can still determine their network prefix by use of
  SLAAC, but more readily be automatically renumbered should
  their network prefix change. [...]

  The disadvantage with static addresses is that they are
  likely to require manual editing should the network prefix
  in use change.  If instead there were a method to only
  manually configure the static identifier part of the IPv6
  address, then the address could be automatically updated
  when a new prefix was introduced, as described in [RFC4192]
  for example.  In such cases a DNS server might be
  configured with such a tokenised interface identifier of
  ::53, and SLAAC would use the token in constructing the
  interface address, using the advertised prefix. [...]

  http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02

The implementation is partially based on top of Mark K.
Thompson's proof of concept. However, it uses the Netlink
interface for configuration resp. data retrival, so that
it can be easily extended in future. Successfully tested
by myself.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 16:55:28 -04:00
Patrick McHardy aaa795ad25 netfilter: nat: propagate errors from xfrm_me_harder()
Propagate errors from ip_xfrm_me_harder() instead of returning EPERM in
all cases.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
Patrick McHardy 58e35d1471 netfilter: ipv6: propagate routing errors from ip6_route_me_harder()
Propagate routing errors from ip_route_me_harder() when dropping a packet
using NF_DROP_ERR(). This makes userspace get the proper error instead of
EPERM for everything.

# ip -6 r a unreachable default table 100
# ip -6 ru add fwmark 0x1 lookup 100
# ip6tables -t mangle -A OUTPUT -d 2001:4860:4860::8888 -j MARK --set-mark 0x1

Old behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

New behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
David S. Miller d978a6361a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/nfc/microread/mei.c
	net/netfilter/nfnetlink_queue_core.c

Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:37:01 -04:00
Christoph Paasch 50a75a8914 ipv6/tcp: Stop processing ICMPv6 redirect messages
Tetja Rediske found that if the host receives an ICMPv6 redirect message
after sending a SYN+ACK, the connection will be reset.

He bisected it down to 093d04d (ipv6: Change skb->data before using
icmpv6_notify() to propagate redirect), but the origin of the bug comes
from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error
handlers.). The bug simply did not trigger prior to 093d04d, because
skb->data did not point to the inner IP header and thus icmpv6_notify
did not call the correct err_handler.

This patch adds the missing "goto out;" in tcp_v6_err. After receiving
an ICMPv6 Redirect, we should not continue processing the ICMP in
tcp_v6_err, as this may trigger the removal of request-socks or setting
sk_err(_soft).

Reported-by: Tetja Rediske <tetja@tetja.de>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:36:08 -04:00
David S. Miller d16658206a Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter and IPVS updates for
your net-next tree, most relevantly they are:

* Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE.
  The LOG and ebt_log target has been also adapted, but they still
  depend on the syslog netnamespace that seems to be missing, from
  Gao Feng.

* Don't lose indications of congestion in IPv6 fragmentation handling,
  from Hannes Frederic Sowa.i

* IPVS conversion to use RCU, including some code consolidation patches
  and optimizations, also some from Julian Anastasov.

* cpu fanout support for NFQUEUE, from Holger Eitzenberger.

* Better error reporting to userspace when dropping packets from
  all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:22:06 -04:00
Hannes Frederic Sowa b8dd6a223e netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
This change brings netfilter reassembly logic on par with
reassembly.c. The corresponding change in net-next is
(eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for
ipv6 fragmentation handling)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-06 13:06:37 +02:00
Gao feng 30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
David S. Miller 4f4ecd5f2a Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter updates for your net tree,
they are:

* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
  while using the TRACE target with several net namespaces.

* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
  are used, from Matthias Schiffer.

* Fix invalid nfacct objects with empty name, they are now rejected
  with -EINVAL, spotted by Michael Zintakis, patch from myself.

* A couple of fixes for wrong return values in the error path of
  nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:41:53 -04:00
Matthias Schiffer 906b1c394d netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths
The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-03 12:24:56 +02:00
David S. Miller d662483264 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-03 01:31:54 -04:00
Balakumaran Kannan 25fb6ca4ed net IPv6 : Fix broken IPv6 routing table after loopback down-up
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:37:19 -04:00
David S. Miller a210576cf8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/mac80211/sta_info.c
	net/wireless/core.h

Two minor conflicts in wireless.  Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-01 13:36:50 -04:00
Hannes Frederic Sowa 1c4a154e52 ipv6: don't accept node local multicast traffic from the wire
Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
verified: http://www.rfc-editor.org/errata_search.php?eid=3480

We have to check for pkt_type and loopback flag because either the
packets are allowed to travel over the loopback interface (in which case
pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
over a non-loopback interface back to us (in which case PACKET_TYPE is
PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 14:57:33 -04:00
Hong zhi guo 573ce260b3 net-next: replace obsolete NLMSG_* with type safe nlmsg_*
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:25:25 -04:00
David S. Miller e2a553dbf1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/ipip.h

The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 13:52:49 -04:00
YOSHIFUJI Hideaki / 吉藤英明 cb6bf35502 firewire net, ipv6: IPv6 over Firewire (RFC3146) support.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:32:13 -04:00
Pravin B Shelar f61dd388a9 Tunneling: use IP Tunnel stats APIs.
Use common function get calculate rtnl_link_stats64 stats.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:19 -04:00
Pravin B Shelar c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Hong Zhiguo a79ca223e0 ipv6: fix bad free of addrconf_init_net
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 13:55:38 -04:00
David S. Miller da13482534 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS updates for
your net-next tree, they are:

* Better performance in nfnetlink_queue by avoiding copy from the
  packet to netlink message, from Eric Dumazet.

* Remove unnecessary locking in the exit path of ebt_ulog, from Gao Feng.

* Use new function ipv6_iface_scope_id in nf_ct_ipv6, from Hannes Frederic Sowa.

* A couple of sparse fixes for IPVS, from Julian Anastasov.

* Use xor hashing in nfnetlink_queue, as suggested by Eric Dumazet, from
  myself.

* Allow to dump expectations per master conntrack via ctnetlink, from myself.

* A couple of cleanups to use PTR_RET in module init path, from Silviu-Mihai
  Popescu.

* Remove nf_conntrack module a bit faster if netns are in use, from
  Vladimir Davydov.

* Use checksum_partial in ip6t_NPT, from YOSHIFUJI Hideaki.

* Sparse fix for nf_conntrack, from Stephen Hemminger.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:11:44 -04:00
Hannes Frederic Sowa eec2e6185f ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- >8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Nicolas Dichtel 63998ac24f ipv6: provide addr and netconf dump consistency info
This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:29 -04:00
Thomas Graf 661d2967b3 rtnetlink: Remove passing of attributes into rtnl_doit functions
With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:31:16 -04:00
David S. Miller 61816596d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:46:26 -04:00
Tom Parkin 44046a593e udp: add encap_destroy callback
Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.

In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation.  For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.

This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
David S. Miller 90b2621fd4 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:

* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
  complaining that this target does not work in the nat table, which is the
  wrong table for it, from Florian Westphal.

* Fix possible use before initialization in the netns init path of several
  conntrack protocol trackers (introduced recently while improving conntrack
  netns support), from Gao Feng.

* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
  by Eric Dumazet during the NFWS2013, patch from myself.

* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.

* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
  not required anymore after change introduced in 3.7, again from Julian.

* Fix SYN looping in IPVS state sync if the backup is used a real server
  in DR/TUN modes, this required a new /proc entry to disable the director
  function when acting as backup, also from Julian.

* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
  Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 10:23:52 -04:00
Hannes Frederic Sowa 5a3da1fe95 inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 10:28:36 -04:00
Eric Dumazet 0d4f060861 tcp: dont handle MTU reduction on LISTEN socket
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
LISTEN socket, and this socket is currently owned by the user, we
set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.

This is bad because if we clone the parent before it had a chance to
clear the flag, the child inherits the tsq_flags value, and next
tcp_release_cb() on the child will decrement sk_refcnt.

Result is that we might free a live TCP socket, as reported by
Dormando.

IPv4: Attempt to release TCP socket in state 1

Fix this issue by testing sk_state against TCP_LISTEN early, so that we
set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)

This bug was introduced in commit 563d34d057
(tcp: dont drop MTU reduction indications)

Reported-by: dormando <dormando@rydia.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-18 13:31:28 -04:00
Christoph Paasch 1a2c6181c4 tcp: Remove TCPCT
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
	average (among 7 runs) of 20845.5 Requests/Second
after:
	average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 14:35:13 -04:00
Florian Westphal a82783c91d netfilter: ip6t_NPT: restrict to mangle table
As the translation is stateless, using it in nat table
doesn't work (only initial packet is translated).
filter table OUTPUT works but won't re-route the packet after translation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:58:21 +01:00
Hannes Frederic Sowa d00bd3d4fb netfilter: nf_ct_ipv6: use ipv6_iface_scope_id in conntrack to return scope id
As in (842df07 ipv6: use newly introduced __ipv6_addr_needs_scope_id and
ipv6_iface_scope_id).

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:48:43 +01:00
YOSHIFUJI Hideaki 2d2fd8c50a netfilter: ip6t_NPT: Use csum_partial()
[ Some fixes went into mainstream before this patch, so I needed
  to rebase it upon the current tree, that's why it's different from
  the original one posted on the list --pablo ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:02:04 +01:00
David S. Miller e5f2ef7ab4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/netdev.c

Minor conflict in e1000e, a line that got fixed in 'net'
has been removed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:52:22 -04:00
Cong Wang e8f72ea4a1 ipv6: introduce ip6tunnel_xmit() helper
Similar to iptunnel_xmit(), group these operations into a
helper function.

This by the way fixes the missing u64_stats_update_begin()
and u64_stats_update_end() for 32 bit arch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:53:34 -04:00
Mathias Krause 22c352195e ipv6: remove superfluous nla_data() NULL pointer checks
nla_data() cannot return NULL, so these NULL pointer checks are
superfluous.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:46:09 -04:00
Cong Wang 6aed0c8bf7 tunnel: use iptunnel_xmit() again
With recent patches from Pravin, most tunnels can't use iptunnel_xmit()
any more, due to ip_select_ident() and skb->ip_summed. But we can just
move these operations out of iptunnel_xmit(), so that tunnels can
use it again.

This by the way fixes a bug in vxlan (missing nf_reset()) for net-next.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 03:05:44 -04:00
Pravin B Shelar 7313626745 tunneling: Add generic Tunnel segmentation.
Adds generic tunneling offloading support for IPv4-UDP based
tunnels.
GSO type is added to request this offload for a skb.
netdev feature NETIF_F_UDP_TUNNEL is added for hardware offloaded
udp-tunnel support. Currently no device supports this feature,
software offload is used.

This can be used by tunneling protocols like VXLAN.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:17 -05:00
Pravin B Shelar ec5f061564 net: Kill link between CSUM and SG features.
Earlier SG was unset if CSUM was not available for given device to
force skb copy to avoid sending inconsistent csum.
Commit c9af6db4c1 (net: Fix possible wrong checksum generation)
added explicit flag to force copy to fix this issue.  Therefore
there is no need to link SG and CSUM, following patch kills this
link between there two features.

This patch is also required following patch in series.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
Hannes Frederic Sowa 3868b7aa76 ipv6: report sin6_scope_id if sockopt RECVORIGDSTADDR is set
v4:
a) unchanged

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:23 -05:00
Hannes Frederic Sowa 842df07397 ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id
This patch requires multicast interface-scoped addresses to supply a
sin6_scope_id. Because the sin6_scope_id is now also correctly used
in case of interface-scoped multicast traffic this enables one to use
interface scoped addresses over interfaces which are not targeted by the
default multicast route (the route has to be put there manually, though).

getsockname() and getpeername() now return the correct sin6_scope_id in
case of interface-local mc addresses.

v2:
a) rebased ontop of patch 1/4 (now uses ipv6_addr_props)

v3:
a) reverted changes for ipv6_addr_props

v4:
a) unchanged

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>dave
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
Thomas Graf 80580d4b20 ipv6: ndisc: remove redundant check for !dev->addr_len
send_sllao is already initialized with the value of dev->addr_len

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
Hannes Frederic Sowa ddf64354af ipv6: stop multicast forwarding to process interface scoped addresses
v2:
a) used struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props

v4:
a) do not use __ipv6_addr_needs_scope_id

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:28:20 -05:00
Eric Dumazet 7f0e44ac9f ipv6 flowlabel: add __rcu annotations
Commit 18367681a1 (ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.)
omitted proper __rcu annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:33:10 -05:00
Nicolas Dichtel 7a6742003f netconf: add the handler to dump entries
It's useful to be able to get the initial state of all entries. The patch adds
the support for IPv4 and IPv6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 15:40:53 -05:00
Nicolas Dichtel a947b0a93e xfrm: allow to avoid copying DSCP during encapsulation
By default, DSCP is copying during encapsulation.
Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
different DSCP may get reordered relative to each other in the network and then
dropped by the remote IPsec GW if the reordering becomes too big compared to the
replay window.

It is possible to avoid this copy with netfilter rules, but it's very convenient
to be able to configure it for each SA directly.

This patch adds a toogle for this purpose. By default, it's not set to maintain
backward compatibility.

Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-06 07:02:45 +01:00
Flavio Leitner dd9f319d94 tcp: ipv6: bind() use stronger condition for bind_conflict
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

This is a continuation of commit aacd9289af
for IPv6.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-05 23:40:00 -05:00
Linus Torvalds 9da060d0ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...
2013-03-05 18:42:29 -08:00
Lorenzo Colitti 3e8b0ac3e4 net: ipv6: Don't purge default router if accept_ra=2
Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel
to accept RAs even when forwarding is enabled. However, enabling
forwarding purges all default routes on the system, breaking
connectivity until the next RA is received. Fix this by not
purging default routes on interfaces that have accept_ra=2.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Neil Horman d8c6f4b9b7 ipv[4|6]: correct dropwatch false positive in local_deliver_finish
I had a report recently of a user trying to use dropwatch to localise some frame
loss, and they were getting false positives.  Turned out they were using a user
space SCTP stack that used raw sockets to grab frames.  When we don't have a
registered protocol for a given packet, we record it as a drop, even if a raw
socket receieves the frame.  We should only record the drop in the event a raw
socket doesnt exist to receive the frames

Tested by the reported successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
Tested-by: William Reich <reich@ulticom.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: William Reich <reich@ulticom.com>
CC: eric.dumazet@gmail.com
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01 15:56:29 -05:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds 7c2db36e73 Merge branch 'akpm' (incoming from Andrew)
Merge misc patches from Andrew Morton:

 - Florian has vanished so I appear to have become fbdev maintainer
   again :(

 - Joel and Mark are distracted to welcome to the new OCFS2 maintainer

 - The backlight queue

 - Small core kernel changes

 - lib/ updates

 - The rtc queue

 - Various random bits

* akpm: (164 commits)
  rtc: rtc-davinci: use devm_*() functions
  rtc: rtc-max8997: use devm_request_threaded_irq()
  rtc: rtc-max8907: use devm_request_threaded_irq()
  rtc: rtc-da9052: use devm_request_threaded_irq()
  rtc: rtc-wm831x: use devm_request_threaded_irq()
  rtc: rtc-tps80031: use devm_request_threaded_irq()
  rtc: rtc-lp8788: use devm_request_threaded_irq()
  rtc: rtc-coh901331: use devm_clk_get()
  rtc: rtc-vt8500: use devm_*() functions
  rtc: rtc-tps6586x: use devm_request_threaded_irq()
  rtc: rtc-imxdi: use devm_clk_get()
  rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
  rtc: rtc-pcf8583: use dev_warn() instead of printk()
  rtc: rtc-sun4v: use pr_warn() instead of printk()
  rtc: rtc-vr41xx: use dev_info() instead of printk()
  rtc: rtc-rs5c313: use pr_err() instead of printk()
  rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
  rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
  rtc: rtc-ds2404: use dev_err() instead of printk()
  rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
  ...
2013-02-21 17:38:49 -08:00
Christian Kujau 242260fb85 sun.com documentation fixes
After I came across a help text for SUNGEM mentioning a broken sun.com
URL, I felt like fixing those up, as they are now pointing to oracle.com
URLs.

Signed-off-by: Christian Kujau <lists@nerdbynature.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21 17:22:20 -08:00
Linus Torvalds 06991c28f3 Driver core patches for 3.9-rc1
Here is the big driver core merge for 3.9-rc1
 
 There are two major series here, both of which touch lots of drivers all
 over the kernel, and will cause you some merge conflicts:
   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.
   - remove CONFIG_EXPERIMENTAL
 
 If you need me to provide a merged tree to handle these resolutions,
 please let me know.
 
 Other than those patches, there's not much here, some minor fixes and
 updates.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
 =yWAQ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg Kroah-Hartman:
 "Here is the big driver core merge for 3.9-rc1

  There are two major series here, both of which touch lots of drivers
  all over the kernel, and will cause you some merge conflicts:

   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.

   - remove CONFIG_EXPERIMENTAL

  Other than those patches, there's not much here, some minor fixes and
  updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
  base: memory: fix soft/hard_offline_page permissions
  drivercore: Fix ordering between deferred_probe and exiting initcalls
  backlight: fix class_find_device() arguments
  TTY: mark tty_get_device call with the proper const values
  driver-core: constify data for class_find_device()
  firmware: Ignore abort check when no user-helper is used
  firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
  firmware: Make user-mode helper optional
  firmware: Refactoring for splitting user-mode helper code
  Driver core: treat unregistered bus_types as having no devices
  watchdog: Convert to devm_ioremap_resource()
  thermal: Convert to devm_ioremap_resource()
  spi: Convert to devm_ioremap_resource()
  power: Convert to devm_ioremap_resource()
  mtd: Convert to devm_ioremap_resource()
  mmc: Convert to devm_ioremap_resource()
  mfd: Convert to devm_ioremap_resource()
  media: Convert to devm_ioremap_resource()
  iommu: Convert to devm_ioremap_resource()
  drm: Convert to devm_ioremap_resource()
  ...
2013-02-21 12:05:51 -08:00
YOSHIFUJI Hideaki / 吉藤英明 ecd9883724 ipv6: fix race condition regarding dst->expires and dst->from.
Eric Dumazet wrote:
| Some strange crashes happen in rt6_check_expired(), with access
| to random addresses.
|
| At first glance, it looks like the RTF_EXPIRES and
| stuff added in commit 1716a96101
| (ipv6: fix problem with expired dst cache)
| are racy : same dst could be manipulated at the same time
| on different cpus.
|
| At some point, our stack believes rt->dst.from contains a dst pointer,
| while its really a jiffie value (as rt->dst.expires shares the same area
| of memory)
|
| rt6_update_expires() should be fixed, or am I missing something ?
|
| CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060

Because we do not have any locks for dst_entry, we cannot change
essential structure in the entry; e.g., we cannot change reference
to other entity.

To fix this issue, split 'from' and 'expires' field in dst_entry
out of union.  Once it is 'from' is assigned in the constructor,
keep the reference until the very last stage of the life time of
the object.

Of course, it is unsafe to change 'from', so make rt6_set_from simple
just for fresh entries.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Neil Horman <nhorman@tuxdriver.com>
CC: Gao Feng <gaofeng@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reported-by: Steinar H. Gunderson <sesse@google.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 15:11:45 -05:00
David S. Miller 2ccba5433b Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contain updates for your net-next tree, they are:

* Fix (for just added) connlabel dependencies, from Florian Westphal.

* Add aliasing support for conntrack, thus users can either use -m state
  or -m conntrack from iptables while using the same kernel module, from
  Jozsef Kadlecsik.

* Some code refactoring for the CT target to merge common code in
  revision 0 and 1, from myself.

* Add aliasing support for CT, based on patch from Jozsef Kadlecsik.

* Add one mutex per nfnetlink subsystem, from myself.

* Improved logging for packets that are dropped by helpers, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 23:42:09 -05:00
David S. Miller 6338a53a2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into net
Pull in 'net' to take in the bug fixes that didn't make it into
3.8-final.

Also, deal with the semantic conflict of the change made to
net/ipv6/xfrm6_policy.c   A missing rt6->n neighbour release
was added to 'net', but in 'net-next' we no longer cache the
neighbour entries in the ipv6 routes so that change is not
appropriate there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 23:34:21 -05:00
Pablo Neira Ayuso b20ab9cc63 netfilter: nf_ct_helper: better logging for dropped packets
Connection tracking helpers have to drop packets under exceptional
situations. Currently, the user gets the following logging message
in case that happens:

	nf_ct_%s: dropping packet ...

However, depending on the helper, there are different reasons why a
packet can be dropped.

This patch modifies the existing code to provide more specific
error message in the scope of each helper to help users to debug
the reason why the packet has been dropped, ie:

	nf_ct_%s: dropping packet: reason ...

Thanks to Joe Perches for many formatting suggestions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-19 02:48:05 +01:00
Eric Dumazet 8064b3cf75 ipv6: fix a sparse warning
net/ipv6/reassembly.c:82:72: warning: incorrect type in argument 3 (different base types)
net/ipv6/reassembly.c:82:72:    expected unsigned int [unsigned] [usertype] c
net/ipv6/reassembly.c:82:72:    got restricted __be32 [usertype] id

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 15:28:00 -05:00
Romain KUNTZ 18cf0d0784 xfrm: release neighbor upon dst destruction
Neighbor is cloned in xfrm6_fill_dst but seems to never be released.
Neighbor entry should be released when XFRM6 dst entry is destroyed
in xfrm6_dst_destroy, otherwise references may be kept forever on
the device pointed by the neighbor entry.

I may not have understood all the subtleties of XFRM & dst so I would
be happy to receive comments on this patch.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 14:57:29 -05:00
Gao feng ece31ffd53 net: proc: change proc_net_remove to remove_proc_entry
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.

this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 14:53:08 -05:00
Gao feng d4beaa66ad net: proc: change proc_net_fops_create to proc_create
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 14:53:08 -05:00
stephen hemminger 0d4bfa297c ipv6: fix warning in xfrm6_mode_tunnel_input
Should not use assignment in conditional:
 warning: suggest parentheses around assignment used as truth value [-Wparentheses]

Problem introduced by:
commit 14bbd6a565
Author: Pravin B Shelar <pshelar@nicira.com>
Date:   Thu Feb 14 09:44:49 2013 +0000

    net: Add skb_unclone() helper function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:42:47 -05:00
Eric Dumazet 279e9f2ff7 ipv6: optimize inet6_hash_frag()
Use ipv6_addr_hash() and a single jhash invocation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:23:51 -05:00
Pravin B Shelar 68c3316311 v4 GRE: Add TCP segmentation offload for GRE
Following patch adds GRE protocol offload handler so that
skb_gso_segment() can segment GRE packets.
SKB GSO CB is added to keep track of total header length so that
skb_segment can push entire header. e.g. in case of GRE, skb_segment
need to push inner and outer headers to every segment.
New NETIF_F_GRE_GSO feature is added for devices which support HW
GRE TSO offload. Currently none of devices support it therefore GRE GSO
always fall backs to software GSO.

[ Compute pkt_len before ip_local_out() invocation. -DaveM ]

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-15 15:17:11 -05:00
Pravin B Shelar 14bbd6a565 net: Add skb_unclone() helper function.
This function will be used in next GRE_GSO patch. This patch does
not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2013-02-15 15:10:37 -05:00
David S. Miller e0376d0043 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.

2) Prepare xfrm and pf_key for algorithms without pf_key support,
   from Jussi Kivilinna.

3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.

4) Add an IPsec state resolution packet queue to handle
   packets that are send before the states are resolved.

5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
   From Michal Kubecek.

6) The xfrm gc threshold was configurable just in the initial
   namespace, make it configurable in all namespaces. From
   Michal Kubecek.

7) We currently can not insert policies with mark and mask
   such that some flows would be matched from both policies.
   Allow this if the priorities of these policies are different,
   the one with the higher priority is used in this case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-14 13:29:20 -05:00
Michal Kubeček 894e2ac82b netfilter: nf_ct_reasm: fix per-netns sysctl initialization
Adjusting of data pointers in net/netfilter/nf_conntrack_frag6_*
sysctl table for other namespaces points to wrong netns_frags
structure and has reversed order of entries.

Problem introduced by commit c038a767cd in 3.7-rc1

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-13 20:46:06 +01:00
Pravin B Shelar c9af6db4c1 net: Fix possible wrong checksum generation.
Patch cef401de7b (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.

Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.

tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 13:30:10 -05:00
Andrey Vagin ee684b6f28 tcp: send packets with a socket timestamp
A socket timestamp is a sum of the global tcp_time_stamp and
a per-socket offset.

A socket offset is added in places where externally visible
tcp timestamp option is parsed/initialized.

Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way
as for TIME_WAIT sockets.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 13:22:16 -05:00
David S. Miller 9f6d98c298 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

The bnx2x gso_type setting bug fix in 'net' conflicted with
changes in 'net-next' that broke the gso_* setting logic
out into a seperate function, which also fixes the bug in
question.  Thus, use the 'net-next' version.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-12 18:58:28 -05:00
Hannes Frederic Sowa 2c5e893384 ipv6: by default join ff01::1 and in case of forwarding ff01::2 and ff05:2
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 14:16:26 -05:00
Hannes Frederic Sowa 20314092c1 ipv6: don't accept multicast traffic with scope 0
v2:
a) moved before multicast source address check
b) changed comment to netdev style

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 14:00:54 -05:00
Hannes Frederic Sowa dd40851521 ipv6: don't let node/interface scoped multicast traffic escape on the wire
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 14:00:54 -05:00
YOSHIFUJI Hideaki / 吉藤英明 ec16ef2228 ipv6 mcast: Do not join device multicast for interface-local multicasts.
RFC4291 (IPv6 addressing architecture) says that interface-Local scope
spans only a single interface on a node.  We should not join L2 device
multicast list for addresses in interface-local (or smaller) scope.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 00:21:44 -05:00
David S. Miller cfa82e020b Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for 3.8-rc7, they are:

* Fix oops in IPVS state-sync due to releasing a random memory area due
  to unitialized pointer, from Dan Carpenter.

* Fix SCTP flow establishment due to bad checksumming mangling in IPVS,
  from Daniel Borkmann.

* Three fixes for the recently added IPv6 NPT, all from YOSHIFUJI Hideaki,
  with an amendment collapsed into those patches from Ulrich Weber. They
  fiix adjustment calculation, fix prefix mangling and ensure LSB of
  prefixes are zeroes (as required by RFC).

Specifically, it took me a while to validate the 1's complement arithmetics/
checksumming approach in the IPv6 NPT code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-10 20:44:08 -05:00
David S. Miller fd5023111c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 18:02:14 -05:00
Amerigo Wang 6a98dcf032 ipv6: fix a RCU warning in net/ipv6/ip6_flowlabel.c
This patch fixes the following RCU warning:

[   51.680236] ===============================
[   51.681914] [ INFO: suspicious RCU usage. ]
[   51.683610] 3.8.0-rc6-next-20130206-sasha-00028-g83214f7-dirty #276 Tainted: G        W
[   51.686703] -------------------------------
[   51.688281] net/ipv6/ip6_flowlabel.c:671 suspicious rcu_dereference_check() usage!

we should use rcu_dereference_bh() when we hold rcu_read_lock_bh().

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 17:55:48 -05:00
YOSHIFUJI Hideaki / 吉藤英明 edb27228db netfilter: ip6t_NPT: Ensure to check lower part of prefixes are zero
RFC 6296 points that address bits that are not part of the prefix
has to be zeroed.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-07 18:40:27 +01:00
YOSHIFUJI Hideaki / 吉藤英明 d4c38fa87d netfilter: ip6t_NPT: Fix prefix mangling
Make sure only the bits that are part of the prefix are mangled.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-07 18:40:26 +01:00
YOSHIFUJI Hideaki / 吉藤英明 f5271fff56 netfilter: ip6t_NPT: Fix adjustment calculation
Cast __wsum from/to __sum16 is wrong.  Instead, apply appropriate
conversion function: csum_unfold() or csum_fold().

[ The original patch has been modified to undo the final ~ that
  csum_fold returns. We only need to fold the 32-bit word that
  results from the checksum calculation into a 16-bit to ensure
  that the original subnet is restored appropriately. Spotted by
  Ulrich Weber. ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-07 18:37:41 +01:00
Tommi Rantala 41ab3e31bd ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
ip6gre_tunnel_xmit() is leaking the skb when we hit this error branch,
and the -1 return value from this function is bogus. Use the error
handling we already have in place in ip6gre_tunnel_xmit() for this error
case to fix this.

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-06 16:02:05 -05:00
Steffen Klassert f4e53e292a ipv6: Don't send packet to big messages to self
Calling icmpv6_send() on a local message size error leads to an
incorrect update of the path mtu in the case when IPsec is used.
So use ipv6_local_error() instead to notify the socket about the
error.

Reported-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-06 15:12:39 -05:00
Michal Kubecek 8d068875ca xfrm: make gc_thresh configurable in all namespaces
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-06 11:36:29 +01:00
David S. Miller 188d1f76d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/ethtool.c
	drivers/net/vmxnet3/vmxnet3_drv.c
	drivers/net/wireless/iwlwifi/dvm/tx.c
	net/ipv6/route.c

The ipv6 route.c conflict is simple, just ignore the 'net' side change
as we fixed the same problem in 'net-next' by eliminating cached
neighbours from ipv6 routes.

The e1000e conflict is an addition of a new statistic in the ethtool
code, trivial.

The vmxnet3 conflict is about one change in 'net' removing a guarding
conditional, whilst in 'net-next' we had a netdev_info() conversion.

The iwlwifi conflict is dealing with a WARN_ON() conversion in
'net-next' vs. a revert happening in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05 14:12:20 -05:00
Jean Sacren 56db1c5f41 mcast: do not check 'rv' twice in a row
With the loop, don't check 'rv' twice in a row. Without the loop, 'rv'
doesn't even need to be checked.

Make the comment more grammar-friendly.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:26:49 -05:00
Vijay Subramanian 5f1e942cb4 tcp: ipv6: Update MIB counters for drops
This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in
tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can
drop SYNs for various reasons which are not currently tracked.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:06:27 -05:00
Tom Parkin 8e72d37eb3 ipv6: export ip6_datagram_recv_ctl
ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6
ancillary data.  Since ip6_datagram_send_ctl is already publicly exported for
use in modules, ip6_datagram_recv_ctl should also be available to support
ancillary data in the receive path.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 13:53:08 -05:00
Tom Parkin 73df66f8b1 ipv6: rename datagram_send_ctl and datagram_recv_ctl
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific.  Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 13:53:08 -05:00
YOSHIFUJI Hideaki / 吉藤英明 c33e7b05f1 ipv6 anycast: Convert ipv6_sk_ac_lock to spinlock.
Since all users are write-lock, it does not make sense to use
rwlock here.  Use simple spinlock.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 18367681a1 ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 d3aedd5ebd ipv6 flowlabel: Convert hash list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 f256dc59d0 ipv6 flowlabel: Ensure to take lock when modifying np->ip6_sk_fl_list.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:41:12 -05:00
Marcelo Ricardo Leitner bd30e94720 ipv6: do not create neighbor entries for local delivery
They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.

Note that IPv4 already handles it this way. No neighbor entries are
created for local input.

Tested by myself and customer.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 20:26:07 -05:00
YOSHIFUJI Hideaki / 吉藤英明 d9e85655b5 netfilter ip6table_mangle: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明 ff88b30c71 xfrm: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明 07c2fecc36 ipv6 mcast: Use ipv6_addr_equal() in ip6_mc_source().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明 5e98a36ed4 ipv6 addrconf: Fix interface identifiers of 802.15.4 devices.
The "Universal/Local" (U/L) bit must be complmented according to RFC4944
and RFC2464.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:43:04 -05:00
David S. Miller f1e7b73acc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:32:13 -05:00
Hannes Frederic Sowa 218774dc34 ipv6: add anti-spoofing checks for 6to4 and 6rd
This patch adds anti-spoofing checks in sit.c as specified in RFC3964
section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the
checks which could easily be implemented with netfilter.

Specifically this patch adds following logic (based loosely on the
pseudocode in RFC3964 section 5.2):

if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default)
        and outer_src_v4 != embedded_ipv4 (inner_src_v6)
                drop
if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default)
        and outer_dst_v4 != embedded_ipv4 (inner_dst_v6)
                drop
accept

To accomplish the specified security checks proposed by above RFCs,
it is still necessary to employ uRPF filters with netfilter. These new
checks only kick in if the employed addresses are within the 2002::/16 or
another range specified by the 6rd-prefix (which defaults to 2002::/16).

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:22:03 -05:00
Eric W. Biederman 243bb4c6d7 ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled
When attempting to build linux-next with user namespaces enabled I ran
into this fun build error.

  CC      net/ipv6/inet6_connection_sock.o
.../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
.../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
 type ‘kuid_t’
.../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
.../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
make[2]: *** [net/ipv6] Error 2
make[2]: *** Waiting for unfinished jobs....

Using kuid_t instead of int to hold the uid fixes this.

Cc: Tom Herbert <therbert@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:20:12 -05:00
Jiri Pirko 5c766d642b ipv4: introduce address lifetime
There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.

This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.

Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.

Also, this should be back-compatible with all current netlink users.

Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:59:57 -05:00
Jesper Dangaard Brouer 3ef0eb0db4 net: frag, move LRU list maintenance outside of rwlock
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.

Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:24 -05:00
Jesper Dangaard Brouer d433673e5f net: frag helper functions for mem limit tracking
This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 13:36:24 -05:00
Eric Dumazet cef401de7b net: fix possible wrong checksum generation
Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.

He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().

This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.

Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.

Tested:

$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3959.52

$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3216.80

Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.

Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:27:15 -05:00
Dan Carpenter 75356a8143 ip6mr: limit IPv6 MRT_TABLE identifiers
We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE
identifiers" but we need to do it for IPv6 as well.  On IPv6 the name
is "pim6reg" instead of "pimreg" so there is one less digit allowed.

The strcpy() is in ip6mr_reg_vif().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 19:31:03 -05:00
David S. Miller b640bee6d9 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This batch contains netfilter updates for you net-next tree, they are:

* The new connlabel extension for x_tables, that allows us to attach
  labels to each conntrack flow. The kernel implementation uses a
  bitmask and there's a file in user-space that maps the bits with the
  corresponding string for each existing label. By now, you can attach
  up to 128 overlapping labels. From Florian Westphal.

* A new round of improvements for the netns support for conntrack.
  Gao feng has moved many of the initialization code of each module
  of the netns init path. He also made several code refactoring, that
  code looks cleaner to me now.

* Added documentation for all possible tweaks for nf_conntrack via
  sysctl, from Jiri Pirko.

* Cisco 7941/7945 IP phone support for our SIP conntrack helper,
  from Kevin Cernekee.

* Missing header file in the snmp helper, from Stephen Hemminger.

* Finally, a couple of fixes to resolve minor issues with these
  changes, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27 00:56:10 -05:00
Tom Herbert 72289b96c9 soreuseport: UDP/IPv6 implementation
Motivation for soreuseport would be something like a DNS server.  An
alternative would be to recv on the same socket from multiple threads.
As in the case of TCP, the load across these threads tends to be
disproportionate and we also see a lot of contection on the socket lock.
Note that SO_REUSEADDR already allows multiple UDP sockets to bind to
the same port, however there is no provision to prevent hijacking and
nothing to distribute packets across all the sockets sharing the same
bound port.  This patch does not change the semantics of SO_REUSEADDR,
but provides usable functionality of it for unicast.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 13:44:01 -05:00
Tom Herbert 5ba24953e9 soreuseport: TCP/IPv6 implementation
Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 13:44:01 -05:00
Gao feng c296bb4d5d netfilter: nf_conntrack: refactor l4proto support for netns
Move the code that register/unregister l4proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:40:53 +01:00
Gao feng 6330750d56 netfilter: nf_conntrack: refactor l3proto support for netns
Move the code that register/unregister l3proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:39:20 +01:00
Cong Wang 9647bb80a5 ipv6: remove duplicated declaration of ip6_fragment()
It is declared in:
include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));

and net/ip6_route.h is already included.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22 23:18:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明 0cc8d8df9b netfilter: Use IS_ERR_OR_NULL().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22 14:28:29 -05:00
YOSHIFUJI Hideaki / 吉藤英明 3f0d2ba0bd ipv6: Use IS_ERR_OR_NULL().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22 14:28:28 -05:00
David S. Miller 0c8729c9b9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) The transport header did not point to the right place after
   esp/ah processing on tunnel mode in the receive path. As a
   result, the ECN field of the inner header was not set correctly,
   fixes from Li RongQing.

2) We did a null check too late in one of the xfrm_replay advance
   functions. This can lead to a division by zero, fix from
   Nickolai Zeldovich.

3) The size calculation of the hash table missed the muiltplication
   with the actual struct size when the hash table is freed.
   We might call the wrong free function, fix from Michal Kubecek.

4) On IPsec pmtu events we can't access the transport headers of
   the original packet, so force a relookup for all routes
   to notify about the pmtu event.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22 14:20:28 -05:00
YOSHIFUJI Hideaki / 吉藤英明 b820bb6b99 ndisc: Do not try to update "updated" time if neighbour has already gone.
Commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
introduce a bug to try to update "updated" time in neighbour
structure.
Update the "updated" time only if neighbour is available.

Bug was found by Dan Carpenter <dan.carpenter@oracle.com>

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 15:41:41 -05:00
Nicolas Dichtel 660b26dc1a mcast: add multicast proxy support (IPv4 and IPv6)
This patch add the support of proxy multicast, ie being able to build a static
multicast tree. It adds the support of (*,*) and (*,G) entries.

The user should define an (*,*) entry which is not used for real forwarding.
This entry defines the upstream in iif and contains all interfaces from the
static tree in its oifs. It will be used to forward packet upstream when they
come from an interface belonging to the static tree.
Hence, the user should define (*,G) entries to build its static tree. Note that
upstream interface must be part of oifs: packets are sent to all oifs
interfaces except the input interface. This ensures to always join the whole
static tree, even if the packet is not coming from the upstream interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:55:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 4d5c152e86 ndisc: Use compound literals to build redirect message.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明 1cb3fe513f ndisc: Break down ndisc_build_skb() and build message directly.
Construct NS/NA/RS message directly using C99 compound literals.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明 b44b5f4ae9 ndisc: Break down __ndisc_send().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:17 -05:00
YOSHIFUJI Hideaki / 吉藤英明 7b3d9b06d8 ndisc: Fill in ICMPv6 checksum and IPv6 header in ndisc_send_skb().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:17 -05:00
YOSHIFUJI Hideaki / 吉藤英明 f4de84c64e ndisc: Use ndisc_send_skb() for redirect.
Reuse dst if one is attached with skb.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:17 -05:00
YOSHIFUJI Hideaki / 吉藤英明 aa4bdd4b3f ndisc: Remove icmp6h argument from ndisc_send_skb().
skb_transport_header() (thus icmp6_hdr()) is available here,
use it.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:17 -05:00
YOSHIFUJI Hideaki / 吉藤英明 5f5a011563 ndisc: Make ndisc_fill_xxx_option() for sk_buff.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:16 -05:00
YOSHIFUJI Hideaki / 吉藤英明 2ce1357614 ndisc: Calculate message body length and option length separately.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:16 -05:00
YOSHIFUJI Hideaki / 吉藤英明 5135e633f9 ndisc: Reset skb->trasport_headner inside ndisc_alloc_send_skb().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:16 -05:00
YOSHIFUJI Hideaki / 吉藤英明 527a150fb2 ndisc: Defer building IPv6 header.
Build ICMPv6 message first and make buffer management easier;
we can use skb->len when filling checksum in ICMPv6 header,
and then build IP header with length field.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:16 -05:00
YOSHIFUJI Hideaki / 吉藤英明 af9a997629 ndisc: Remove dev argument for ndisc_send_skb().
Since we have skb->dev, use it.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 f382d03ad0 ndisc: Set skb->dev and skb->protocol inside ndisc_alloc_skb().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 c8d6c380d9 ndisc: Simplify arguments for ip6_nd_hdr().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 2576f17dfa ipv6: Unshare ip6_nd_hdr() and change return type to void.
- move ip6_nd_hdr() to its users' source files.
  In net/ipv6/mcast.c, it will be called ip6_mc_hdr().
- make return type to void since this function never fails.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 de09334b93 ndisc: Introduce ndisc_alloc_skb() helper.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 9c86dafe94 ndisc: Introduce ndisc_fill_redirect_hdr_option().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6bce6b4e16 ndisc: Use skb_linearize() instead of pskb_may_pull(skb, skb->len).
Suggested by Eric Dumazet <edumazet@google.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 c558e9fca8 ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h.
This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option()
use ndisc_opt_addr_space().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 315ff09dba ndisc: Reduce number of arguments for ndisc_fill_addr_option().
Add pointer to struct net_device (dev) and remove
data_len (= dev->addr_len) and addr_type (= dev->type).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21 13:33:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 fb568637e5 ndisc: Make several arguments for ndisc_send_na() boolean.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:49 -05:00
YOSHIFUJI Hideaki / 吉藤英明 ca97a644d7 ipv6: Introduce ipv6_addr_is_solict_mult() to check Solicited Node Multicast Addresses.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 22:29:49 -05:00
Hannes Frederic Sowa 1ad759d847 ipv6: remove unneeded check to pskb_may_pull in ipip6_rcv
This is already checked by the caller (tunnel64_rcv) and brings ipip6_rcv
in line with ipip_rcv.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:43:51 -05:00
YOSHIFUJI Hideaki / 吉藤英明 115b0aa6b4 ndisc: Check NS message length before access.
Check message length before accessing "target" field,
as we do for other types.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 12fd84f438 ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers.
Because of rt->n removal, we do not need neigh argument any more.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:41:13 -05:00
Steffen Klassert 6f809da27c ipv6: Add an error handler for icmp6
pmtu and redirect events are now handled in the protocols error handler,
so add an error handler for icmp6 to do this. It is needed in the case
when we have no socket context. Based on a patch by Duan Jiong.

Reported-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:19:42 -05:00
Greg Kroah-Hartman ed408f7c0f Merge 3.9-rc4 into driver-core-next
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 19:48:18 -08:00
YOSHIFUJI Hideaki / 吉藤英明 887c95cc1d ipv6: Complete neighbour entry removal from dst_entry.
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6fd6ce2056 ipv6: Do not depend on rt->n in ip6_finish_output2().
If neigh is not found, create new one.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 707be1ff3d ipv6: Do not depend on rt->n in ip6_dst_lookup_tail().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 2152caea71 ipv6: Do not depend on rt->n in rt6_probe().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 145a36217a ipv6: Do not depend on rt->n in rt6_check_neigh().
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 c440f1609b ipv6: Do not depend on rt->n in ip6_pol_route().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明 dd0cbf29b1 ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway.
Do not depend on rt->n.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明 8e022ee63f ndisc: Remove tbl argument for __ipv6_neigh_lookup().
We can refer to nd_tbl directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明 7ff74a596b ndisc: Update neigh->updated with write lock.
neigh->nud_state and neigh->updated are under protection of
neigh->lock.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 18:38:18 -05:00
Romain KUNTZ 7efdba5bd9 ipv6: fix header length calculation in ip6_append_data()
Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.

rt->rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt->dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.

This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 03:37:13 -05:00
David S. Miller 4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6059283378 ipv6 netevent: Remove old_neigh from netevent_redirect.
The only user is cxgb3 driver.

old_neigh is used to check device change, but it must not happen
on redirect.  In this sense, we can remove old_neigh argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:04:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明 dd3332bfcb ipv6: Store Router Alert option in IP6CB directly.
Router Alert option is very small and we can store the value
itself in the skb.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 2b464f61f0 ipv6 xfrm: Use ipv6_addr_hash() in xfrm6_tunnel_spi_hash_byaddr().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 c08977bb2b ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 daad151263 ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 e7219858ac ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass().
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced
ipv6_tclass(), but similar function is already available as
ipv6_get_dsfield().

We might be able to call ipv6_tclass() from ipv6_get_dsfield(),
but it is confusing to have two versions.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6502ca527f ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Kees Cook f9ceb16ec4 net/ipv6: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David S. Miller <davem@davemloft.net>
2013-01-11 11:40:01 -08:00
Romain Kuntz 21caa6622b ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
Replace ip6_route_lookup() with addrconf_get_prefix_route() when
looking up for a prefix route. This ensures that the connected prefix
is looked up in the main table, and avoids the selection of other
matching routes located in different tables as well as blackhole
or prohibited entries.

In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6:
del unreachable route when an addr is deleted on lo), that would occur
when a blackhole or prohibited entry is selected by ip6_route_lookup().
Such entries have a NULL rt6i_table argument, which is accessed by
__ip6_del_rt() when trying to lock rt6i_table->tb6_lock.

The function addrconf_is_prefix_route() is not used anymore and is
removed.

[v2] Minor indentation cleanup and log updates.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:22:54 -08:00
Romain Kuntz 85da53bf1c ipv6: fix the noflags test in addrconf_get_prefix_route
The tests on the flags in addrconf_get_prefix_route() does no make
much sense: the 'noflags' parameter contains the set of flags that
must not match with the route flags, so the test must be done
against 'noflags', and not against 'flags'.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10 14:13:33 -08:00
YOSHIFUJI Hideaki / 吉藤英明 6c40d100ce ipv6: Use container_of macro instead of magic number to get ipv6 header.
In ipv6_recv_error(), addr_offset points to daddr field of the ip header.
To get ipv6 header, use container_of() macro instead of substracting magic
number (24).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:59:53 -08:00
YOSHIFUJI Hideaki / 吉藤英明 ba96bcbcd2 ipv6: Use FIELD_SIZEOF() in inet6_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
Cong Wang acb3e04119 ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library
As suggested by David, udp6_csum_init() is too big to be inlined,
move it to ipv6 static library, net/ipv6/ip6_checksum.c.

And the generic csum_ipv6_magic() too.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 17:56:10 -08:00
Li RongQing a9403f8aeb ah6/esp6: set transport header correctly for IPsec tunnel mode.
IPsec tunnel does not set ECN field to CE in inner header when
the ECN field in the outer header is CE, and the ECN field in
the inner header is ECT(0) or ECT(1).

The cause is ipip6_hdr() does not return the correct address of
inner header since skb->transport-header is not the inner header
after esp6_input_done2(), or ah6_input().

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-08 12:41:30 +01:00
Hannes Frederic Sowa 5d134f1c1f tcp: make sysctl_tcp_ecn namespace aware
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware.  The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:09:56 -08:00
YOSHIFUJI Hideaki / 吉藤英明 71bcdba06d ndisc: Use struct rd_msg for redirect message.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:08:38 -08:00
YOSHIFUJI Hideaki / 吉藤英明 b7dc8c3959 ndisc: Remove unused space at tail of skb for ndisc messages. (TAKE 3)
Currently, the size of skb allocated for NDISC is MAX_HEADER +
LL_RESERVED_SPACE(dev) + packet length + dev->needed_tailroom,
but only LL_RESERVED_SPACE(dev) bytes is "reserved" for headers.
As a result, the skb looks like this (after construction of the
message):

head       data                   tail                       end
+--------------------------------------------------------------+
+           |                      |          |                |
+--------------------------------------------------------------+
|<-hlen---->|<---ipv6 packet------>|<--tlen-->|<--MAX_HEADER-->|
    =LL_                               = dev
     RESERVED_                           ->needed_
     SPACE(dev)                            tailroom

As the name implies, "MAX_HEADER" is used for headers, and should
be "reserved" in prior to packet construction.  Or, if some space
is really required at the tail of ther skb, it should be
explicitly documented.

We have several option after construction of NDISC message:

Option 1:

head       data                   tail       end
+---------------------------------------------+
+           |                      |          |
+---------------------------------------------+
|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
   =LL_                                = dev
    RESERVED_                           ->needed_
    SPACE(dev)                            tailroom

Option 2:

head            data                   tail       end
+--------------------------------------------------+
+                |                      |          |
+--------------------------------------------------+
|<--MAX_HEADER-->|<---ipv6 packet------>|<--tlen-->|
                                            = dev
                                             ->needed_
                                               tailroom

Option 3:

head                        data                   tail       end
+--------------------------------------------------------------+
+                |           |                      |          |
+--------------------------------------------------------------+
|<--MAX_HEADER-->|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
                    =LL_                                = dev
                     RESERVED_                          ->needed_
                     SPACE(dev)                           tailroom

Our tunnel drivers try expanding headroom and the space for tunnel
encapsulation was not a mandatory space -- so we are not seeing
bugs here --, but just for optimization for performance critial
situations.

Since NDISC messages are not performance critical unlike TCP,
and as we know outgoing device, LL_RESERVED_SPACE(dev) should be
just enough for the device in most (if not all) cases:
  LL_RESERVED_SPACE(dev) <= LL_MAX_HEADER <= MAX_HEADER
Note that LL_RESERVED_SPACE(dev) is also enough for NDISC over
SIT (e.g., ISATAP).

So, I think Option 1 is just fine here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-04 15:16:44 -08:00
Ulrich Weber 429da4c0b1 netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation
csum16_add() has a broken carry detection, should be:
sum += sum < (__force u16)b;

Instead of fixing csum16_add, remove the custom checksum
functions and use the generic csum_add/csum_sub ones.

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-04 20:03:02 +01:00
David S. Miller ac196f8c92 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following batch contains Netfilter fixes for 3.8-rc1. They are
a mixture of old bugs that have passed unnoticed (I'll pass these to
stable) and more fresh ones from the previous merge window, they are:

* Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd
  showing up wrong address, from Bob Hockney.

* Fix a comment in nf_conntrack_ipv6, from Florent Fourcot.

* Fix a leak an error path in ctnetlink while creating an expectation,
  from Jesper Juhl.

* Fix missing ICMP time exceeded in the IPv6 defragmentation code, from
  Haibo Xi.

* Fix inconsistent handling of routing changes in MASQUERADE for the
  new connections case, from Andrew Collins.

* Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to
  crashes in the ixgbe driver (since it seems to access the transport
  header with TSO enabled), from Mukund Jampala.

* Recover obsoleted NOTRACK target by including it into the CT and spot
  a warning via printk about being obsoleted. Many people don't check the
  scheduled to be removal file under Documentation, so we follow some
  less agressive approach to kill this in a year or so. Spotted by Florian
  Westphal, patch from myself.

* Fix race condition in xt_hashlimit that allows to create two or more
  entries, from myself.

* Fix crash if the CT is used due to the recently added facilities to
  consult the dying and unconfirmed conntrack lists, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28 14:28:17 -08:00
Isaku Yamahata ae782bb16c ipv6/ip6_gre: set transport header correctly
ip6gre_xmit2() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
Set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)

Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 15:19:56 -08:00
Cong Ding bd7790286b ipv6: addrconf.c: remove unnecessary "if"
the value of err is always negative if it goes to errout, so we don't need to
check the value of err.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19 12:50:06 -08:00
Haibo Xi 97cf00e93c netfilter: nf_ct_reasm: fix conntrack reassembly expire code
Commit b836c99fd6 (ipv6: unify conntrack reassembly expire
code with standard one) use the standard IPv6 reassembly
code(ip6_expire_frag_queue) to handle conntrack reassembly expire.

In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get
which device received this expired packet.so we must save ifindex
when NF_conntrack get this packet.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 1/2 fragmented
IPv6 packet.

Signed-off-by: Haibo Xi <haibbo@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:41:25 +01:00
Florent Fourcot d7a769ff0e netfilter: nf_conntrack_ipv6: fix comment for packets without data
Remove ambiguity of double negation.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:28:31 +01:00
Andrew Collins c65ef8dc7b netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE
target), the MASQUERADE target handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections.  It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.

This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.

Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:28:30 +01:00
Mukund Jampala c6f408996c netfilter: ip[6]t_REJECT: fix wrong transport header pointer in TCP reset
The problem occurs when iptables constructs the tcp reset packet.
It doesn't initialize the pointer to the tcp header within the skb.
When the skb is passed to the ixgbe driver for transmit, the ixgbe
driver attempts to access the tcp header and crashes.
Currently, other drivers (such as our 1G e1000e or igb drivers) don't
access the tcp header on transmit unless the TSO option is turned on.

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000d
<1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>*pdpt = 0000000085e5d001 *pde = 0000000000000000
<0>Oops: 0000 [#1] SMP
[...]
<4>Pid: 0, comm: swapper Tainted: P            2.6.35.12 #1 Greencity/Thurley
<4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16
<4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000
<4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48
<4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
<0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000)
<0>Stack:
<4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002
<4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318
<4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002
<0>Call Trace:
<4> [<d0d176c9>] ? 0xd0d176c9
<4> [<d0d18a4d>] ? 0xd0d18a4d
<4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7
<4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114
<4> [<411f056a>] ? __qdisc_run+0xca/0xe0
<4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0
<4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f
<4> [<411e94a1>] ? neigh_update+0x29c/0x330
<4> [<4121cf29>] ? arp_process+0x49c/0x4cd
<4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121c6d5>] ? T.901+0x38/0x3b
<4> [<4121c918>] ? arp_rcv+0xa3/0xb4
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346
<4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f
<4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30
<4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe]
<4> [<41013468>] ? lapic_next_event+0x13/0x16
<4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4
<4> [<411e1b03>] ? net_rx_action+0x55/0x127
<4> [<4102da1a>] ? __do_softirq+0x77/0xeb
<4> [<4102dab1>] ? do_softirq+0x23/0x27
<4> [<41003a67>] ? do_IRQ+0x7d/0x8e
<4> [<41002a69>] ? common_interrupt+0x29/0x30
<4> [<41007bcf>] ? mwait_idle+0x48/0x4d
<4> [<4100193b>] ? cpu_idle+0x37/0x4c
<0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38
ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00
<0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24
<0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP

Signed-off-by: Mukund Jampala <jbmukund@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:27:35 +01:00
Simon Arlott df48419107 ipv6: Fix Makefile offload objects
The following commit breaks IPv6 TCP transmission for me:
	Commit 75fe83c322
	Author: Vlad Yasevich <vyasevic@redhat.com>
	Date:   Fri Nov 16 09:41:21 2012 +0000
	ipv6: Preserve ipv6 functionality needed by NET

This patch fixes the typo "ipv6_offload" which should be
"ipv6-offload".

I don't know why not including the offload modules should
break TCP. Disabling all offload options on the NIC didn't
help. Outgoing pulseaudio traffic kept stalling.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-16 09:15:53 -08:00
Christoph Paasch e337e24d66 inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
    [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
    [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
    [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
    [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
    [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
    [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
    [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
    [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
    [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
    [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
    [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
    [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
    [<ffffffff814cee68>] ip_rcv+0x217/0x267
    [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
    [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Duan Jiong 093d04d42f ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
YOSHIFUJI Hideaki / 吉藤英明 7bdc1b4aba ndisc: Fix padding error in link-layer address option.
If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
post padding is not initialized correctly.

(Un)fortunately, the only type that requires pad is Infiniband,
whose pad is 2 and data_len is 20, and this logical error has not
become obvious, but it is better to fix.

Note that ndisc_opt_addr_space() handles the situation described
above correctly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
YOSHIFUJI Hideaki fd0ea7dbfa ndisc: Unexport ndisc_{build,send}_skb().
These symbols were exported for bonding device by commit 305d552a
("bonding: send IPv6 neighbor advertisement on failover").

It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle
NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by
commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'").

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 12:42:29 -05:00
Eric Dumazet 0e1efe9d5e ipv6: avoid taking locks at socket dismantle
ipv6_sock_mc_close() is called for ipv6 sockets at close time, and most
of them don't use multicast.

Add a test to avoid contention on a shared spinlock.

Same heuristic applies for ipv6_sock_ac_close(), to avoid contention
on a shared rwlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00