Commit Graph

3429 Commits

Author SHA1 Message Date
Arnd Bergmann 75840f1080 IB/qib: fix false-postive maybe-uninitialized warning
commit f6aafac184a3e46e919769dd4faa8bf0dc436534 upstream.

aarch64-linux-gcc-7 complains about code it doesn't fully understand:

drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code is right, and despite trying hard, I could not come up with a version
that I liked better than just adding a fake initialization here to shut up the
warning.

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 10:46:02 +01:00
Feras Daoud 90326945a4 IB/ipoib: rtnl_unlock can not come after free_netdev
commit 89a3987ab7a923c047c6dec008e60ad6f41fac22 upstream.

The ipoib_vlan_add function calls rtnl_unlock after free_netdev,
rtnl_unlock not only releases the lock, but also calls netdev_run_todo.
The latter function browses the net_todo_list array and completes the
unregistration of all its net_device instances. If we call free_netdev
before rtnl_unlock, then netdev_run_todo call over the freed device causes
panic.
To fix, move rtnl_unlock call before free_netdev call.

Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-11-02 10:46:02 +01:00
Steve Wise 11f43b27cb rdma_cm: fail iwarp accepts w/o connection params
commit f2625f7db4dd0bbd16a9c7d2950e7621f9aa57ad upstream.

cma_accept_iw() needs to return an error if conn_params is NULL.
Since this is coming from user space, we can crash.

Reported-by: Shaobo He <shaobo@cs.utah.edu>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-20 14:04:24 +02:00
Feras Daoud 2c318737ec IB/ipoib: Fix deadlock between rmmod and set_mode
commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream.

When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:47:02 +02:00
Saeed Mahameed 7d26287eee IB/mlx4: Fix port query for 56Gb Ethernet links
commit 6fa26208206c406fa529cd73f7ae6bf4181e270b upstream.

Report the correct speed in the port attributes when using a 56Gbps
ethernet link.  Without this change the field is incorrectly set to 10.

Fixes: a9c766bb75 ('IB/mlx4: Fix info returned when querying IBoE ports')
Fixes: 2e96691c31 ('IB: Use central enum for speed instead of hard-coded values')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:55 +02:00
Maor Gottlieb dcb7311dc3 IB/mlx4: Set traffic class in AH
commit af4295c117b82a521b05d0daf39ce879d26e6cb1 upstream.

Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c41069 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:55 +02:00
Bart Van Assche 029555a52a IB/multicast: Check ib_find_pkey() return value
commit d3a2418ee36a59bc02e9d454723f3175dcf4bfd9 upstream.

This patch avoids that Coverity complains about not checking the
ib_find_pkey() return value.

Fixes: commit 547af76521 ("IB/multicast: Report errors on multicast groups if P_key changes")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:50 +02:00
Bart Van Assche 67ae25b4d1 IB/mad: Fix an array index check
commit 2fe2f378dd45847d2643638c07a7658822087836 upstream.

The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).

Fixes: commit b7ab0b19a8 ("IB/mad: Verify mgmt class in received MADs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-06-08 00:46:50 +02:00
Mark Bloch aa19a88935 IB/cm: Mark stale CM id's whenever the mad agent was unregistered
commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0 upstream.

When there is a CM id object that has port assigned to it, it means that
the cm-id asked for the specific port that it should go by it, but if
that port was removed (hot-unplug event) the cm-id was not updated.
In order to fix that the port keeps a list of all the cm-id's that are
planning to go by it, whenever the port is removed it marks all of them
as invalid.

This commit fixes a kernel panic which happens when running traffic between
guests and we force reboot a guest mid traffic, it triggers a kernel panic:

 Call Trace:
  [<ffffffff815271fa>] ? panic+0xa7/0x16f
  [<ffffffff8152b534>] ? oops_end+0xe4/0x100
  [<ffffffff8104a00b>] ? no_context+0xfb/0x260
  [<ffffffff81084db2>] ? del_timer_sync+0x22/0x30
  [<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
  [<ffffffff81084240>] ? process_timeout+0x0/0x10
  [<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
  [<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
  [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
  [<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
  [<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
  [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
  [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
  [<ffffffff8152a815>] ? page_fault+0x25/0x30
  [<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
  [<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
  [<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
  [<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
  [<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
  [<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
  [<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
  [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
  [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
  [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
  [<ffffffff8109aef6>] ? kthread+0x96/0xa0
  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
  [<ffffffff8109ae60>] ? kthread+0x0/0xa0
  [<ffffffff8100c200>] ? child_rip+0x0/0x20

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:38 +01:00
Tariq Toukan 52aac91d3a IB/uverbs: Fix leak of XRC target QPs
commit 5b810a242c28e1d8d64d718cebe75b79d86a0b2d upstream.

The real QP is destroyed in case of the ref count reaches zero, but
for XRC target QPs this call was missed and caused to QP leaks.

Let's call to destroy for all flows.

Fixes: 0e0ec7e063 ('RDMA/core: Export ib_open_qp() to share XRC...')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:38 +01:00
Matan Barak 1aecb8e417 IB/mlx4: Fix create CQ error flow
commit 593ff73bcfdc79f79a8a0df55504f75ad3e5d1a9 upstream.

Currently, if ib_copy_to_udata fails, the CQ
won't be deleted from the radix tree and the HW (HW2SW).

Fixes: 225c7b1fee ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:38 +01:00
Alex Vesker 95bc51b504 IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
commit e5ac40cd66c2f3cd11bc5edc658f012661b16347 upstream.

Because of an incorrect bit-masking done on the join state bits, when
handling a join request we failed to detect a difference between the
group join state and the request join state when joining as send only
full member (0x8). This caused the MC join request not to be sent.
This issue is relevant only when SRIOV is enabled and SM supports
send only full member.

This fix separates scope bits and join states bits a nibble each.

Fixes: b9c5d6a643 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:37 +01:00
Alex Vesker b81459c789 IB/ipoib: Don't allow MC joins during light MC flush
commit 344bacca8cd811809fc33a249f2738ab757d327f upstream.

This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411]  0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960]  0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547]  ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015]  [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831]  [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403]  [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706]  [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384]  [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031]  [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220]  [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290]  [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911]  [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741]  [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091]  [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880]  [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848]  [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474]  [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510]  [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349]  [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471]  [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152]  [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274]  [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896]  [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632]  [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709]  [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:37 +01:00
Erez Shitrit 4f0992c3ce IB/core: Fix use after free in send_leave function
commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream.

The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.

Additionally, this patch gets rid of group->query_id variable which is
not used.

Fixes: faec2f7b96 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:37 +01:00
Erez Shitrit 27816fef78 IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream.

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:36 +01:00
Yishai Hadas 9a7edde8c0 IB/mlx4: Fix the SQ size of an RC QP
commit f2940e2c76bb554a7fbdd28ca5b90904117a9e96 upstream.

When calculating the required size of an RC QP send queue, leave
enough space for masked atomic operations, which require more space than
"regular" atomic operation.

Fixes: 6fa8f71984 ("IB/mlx4: Add support for masked atomic operations")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:22 +02:00
Erez Shitrit 0c3c7f4ca9 IB/IPoIB: Don't update neigh validity for unresolved entries
commit 61c78eea9516a921799c17b4c20558e2aa780fd3 upstream.

ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send.  This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it.  If the queue for this neighbor is full, then don't update the
alive timestamp.  That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:22 +02:00
Jason Gunthorpe 55ac348c64 IB/security: Restrict use of the write() interface
commit e6bd18f57aad1a2d1ef40e646d03ed0f2515c9e3 upstream.

The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.

Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
[wt: no hfi1 subdir in 3.10. A minimal rdma/ib.h had to be created
 from 3.11 sources to keep the code similar to mainline]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:21 +02:00
Jason Gunthorpe f40b2f17ed IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
commit 8c5122e45a10a9262f872b53f151a592e870f905 upstream.

When this code was reworked for IBoE support the order of assignments
for the sl_tclass_flowlabel got flipped around resulting in
TClass & FlowLabel being permanently set to 0 in the packet headers.

This breaks IB routers that rely on these headers, but only affects
kernel users - libmlx4 does this properly for user space.

Cc: stable@vger.kernel.org
Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:21 +02:00
Mike Marciniszyn b6c92a436f IB/qib: fix mcast detach when qp not attached
commit 09dc9cd6528f5b52bcbd3292a6312e762c85260f upstream.

The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:24 -08:00
Hariprasad S 1630624d53 iw_cxgb3: Fix incorrectly returning error on success
commit 67f1aee6f45059fd6b0f5b0ecb2c97ad0451f6b3 upstream.

The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[a pox on developers and maintainers who do not cc: stable for bug fixes like this - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:20 -08:00
Doron Tsur cef1bb63fd IB/cm: Fix rb-tree duplicate free and use-after-free
commit 0ca81a2840f77855bbad1b9f172c545c4dc9e6a4 upstream.

ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.

Fixes: a977049dac ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09 10:12:59 -08:00
Mike Marciniszyn ef5844a015 IB/qib: Change lkey table allocation to support more MRs
commit d6f1c17e162b2a11e708f28fa93f2f79c164b442 upstream.

The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.

The underlying kernel code cannot allocate that many contiguous pages.

There is no reason the underlying memory needs to be physically
contiguous.

This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
  o this matches the module parameter description

Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:52 -07:00
Noa Osherovich 2698f5747a IB/mlx4: Use correct SL on AH query under RoCE
commit 5e99b139f1b68acd65e36515ca347b03856dfb5a upstream.

The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b52 ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:34 +02:00
Jack Morgenstein a6d452e0f3 IB/mlx4: Forbid using sysfs to change RoCE pkeys
commit 2b135db3e81301d0452e6aa107349abe67b097d6 upstream.

The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e46612 ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:33 +02:00
Yishai Hadas caf233503f IB/uverbs: Fix race between ib_uverbs_open and remove_one
commit 35d4a0b63dc0c6d1177d4f532a9deae958f0662c upstream.

Fixes: 2a72f21226 ("IB/uverbs: Remove dev_table")

Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:33 +02:00
Christoph Hellwig 939f804304 IB/uverbs: reject invalid or unknown opcodes
commit b632ffa7cee439ba5dce3b3bc4a5cbe2b3e20133 upstream.

We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:33 +02:00
Sagi Grimberg 45f29355a3 iser-target: release stale iser connections
commit 2f1b6b7d9a815f341b18dfd26a363f37d4d3c96a upstream.

When receiving a new iser connect request we serialize
the pending requests by adding the newly created iser connection
to the np accept list and let the login thread process the connect
request one by one (np_accept_wait).

In case we received a disconnect request before the iser_conn
has begun processing (still linked in np_accept_list) we should
detach it from the list and clean it up and not have the login
thread process a stale connection. We do it only when the connection
state is not already terminating (initiator driven disconnect) as
this might lead us to access np_accept_mutex after the np was released
in live shutdown scenarios.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:46 -07:00
Sagi Grimberg 394adc1d49 iser-target: Fix possible deadlock in RDMA_CM connection error
commit 4a579da2586bd3b79b025947ea24ede2bbfede62 upstream.

Before we reach to connection established we may get an
error event. In this case the core won't teardown this
connection (never established it), so we take care of freeing
it ourselves.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:46 -07:00
Erez Shitrit 9061537030 IB/mlx4: Fix WQE LSO segment calculation
commit ca9b590caa17bcbbea119594992666e96cde9c2f upstream.

The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.

It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.

The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.

An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.

Fixes: b832be1e40 ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:27 +02:00
Yann Droneaud d016c609f3 IB/core: don't disallow registering region starting at 0x0
commit 66578b0b2f69659f00b6169e6fe7377c4b100d18 upstream.

In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5e4
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:27 +02:00
Yann Droneaud b55c80ba21 IB/core: disallow registering 0-sized memory region
commit 8abaae62f3fdead8f4ce0ab46b4ab93dee39bab2 upstream.

If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:27 +02:00
Majd Dibbiny 94efa6abf1 IB/mlx4: Saturate RoCE port PMA counters in case of overflow
commit 61a3855bb726cbb062ef02a31a832dea455456e0 upstream.

For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.

Fixes: c37791349c ('IB/mlx4: Support PMA counters for IBoE')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:50 +02:00
Shachar Raindel 0cfcc3250e IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream.

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:47 +02:00
Eli Cohen 3af9e93341 IB/core: Avoid leakage from kernel to user space
commit 377b513485fd885dea1083a9a5430df65b35e048 upstream.

Clear the reserved field of struct ib_uverbs_async_event_desc which is
copied to user space.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:47 +02:00
Mitko Haralanov 1152730c69 IB/qib: Do not write EEPROM
commit 18c0b82a3e4501511b08d0e8676fb08ac08734a3 upstream.

This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.

These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-18 13:22:34 +01:00
Sagi Grimberg a3ecefb6bf iser-target: Fix implicit termination of connections
commit b02efbfc9a051b41e71fe8f94ddf967260e024a6 upstream.

In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.

So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.

The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).

We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Sagi Grimberg b80e6c5ae0 iser-target: Handle ADDR_CHANGE event for listener cm_id
commit ca6c1d82d12d8013fb75ce015900d62b9754623c upstream.

The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.

Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.

Squashed:

iser-target: Move cma_id setup to a function

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Sagi Grimberg b524a8828b iser-target: Fix connected_handler + teardown flow race
commit 19e2090fb246ca21b3e569ead51a6a7a1748eadd upstream.

Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.

In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.

Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().

Squashed:

iser-target: Decrement completion context active_qps in error flow

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Sagi Grimberg 022ff2f597 iser-target: Parallelize CM connection establishment
commit 2371e5da8cfe91443339b54444dec6254fdd6dfc upstream.

There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.

This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.

In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.

Squashed:

iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Sagi Grimberg dc0672f1f2 iser-target: Fix flush + disconnect completion handling
commit 128e9cc84566a84146baea2335b3824288eed817 upstream.

ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.

Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.

- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.

Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).

Squashed:

iser-target: Don't deffer disconnected handler to a work

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Sagi Grimberg 839eac57eb iscsi,iser-target: Initiate termination only once
commit 954f23722b5753305be490330cf2680b7a25f4a3 upstream.

Since commit 0fc4ea701fcf ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.

Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).

Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.

Squashed:

iscsi-target: Add call to wait_conn in establishment error flow

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:41 -08:00
Or Gerlitz f2da3c240d ib_isert: Add max_send_sge=2 minimum for control PDU responses
commit f57915cfa5b2b14c1cffa2e83c034f55e3f0e70d upstream.

This patch adds a max_send_sge=2 minimum in isert_conn_setup_qp()
to ensure outgoing control PDU responses with tx_desc->num_sge=2
are able to function correctly.

This addresses a bug with RDMA hardware using dev_attr.max_sge=3,
that in the original code with the ConnectX-2 work-around would
result in isert_conn->max_sge=1 being negotiated.

Originally reported by Chris with ocrdma driver.

Reported-by: Chris Moore <Chris.Moore@emulex.com>
Tested-by: Chris Moore <Chris.Moore@emulex.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:40 -08:00
Chris Moore c50ad63aab IB/isert: Adjust CQ size to HW limits
commit b1a5ad006b34ded9dc7ec64988deba1b3ecad367 upstream.

isert has an issue of trying to create a CQ with more CQEs than are
supported by the hardware, that currently results in failures during
isert_device creation during first session login.

This is the isert version of the patch that Minh Tran submitted for
iser, and is simple a workaround required to function with existing
ocrdma hardware.

Signed-off-by: Chris Moore <chris.moore@emulex.com>
Reviewied-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:40 -08:00
Sagi Grimberg 11b926d128 iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
commit 3b726ae2de02a406cc91903f80132daee37b6f1b upstream.

In this case the cm_id->context is the isert_np, and the cm_id->qp
is NULL, so use that to distinct the cases.

Since we don't expect any other events on this cm_id we can
just return -1 for explicit termination of the cm_id by the
cma layer.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06 15:05:49 -08:00
Bart Van Assche 4b5ba6a22b srp-target: Retry when QP creation fails with ENOMEM
commit ab477c1ff5e0a744c072404bf7db51bfe1f05b6e upstream.

It is not guaranteed to that srp_sq_size is supported
by the HCA. So if we failed to create the QP with ENOMEM,
try with a smaller srp_sq_size. Keep it up until we hit
MIN_SRPT_SQ_SIZE, then fail the connection.

Reported-by: Mark Lehrer <lehrer@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06 15:05:49 -08:00
Sagi Grimberg b997982f68 Target/iser: Don't put isert_conn inside disconnected handler
commit 0fc4ea701fcf5bc51ace4e288af5be741465f776 upstream.

disconnected_handler is invoked on several CM events (such
as DISCONNECTED, DEVICE_REMOVAL, TIMEWAIT_EXIT...). Since
multiple  events can occur while before isert_free_conn is
invoked, we might put all isert_conn references and free
the connection too early.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:12 -07:00
Sagi Grimberg 058ab45435 Target/iser: Get isert_conn reference once got to connected_handler
commit c2f88b17a1d97ca4ecd96cc22333a7a4f1407d39 upstream.

In case the connection didn't reach connected state, disconnected
handler will never be invoked thus the second kref_put on
isert_conn will be missing.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:12 -07:00
Bart Van Assche 70efec16cf IB/srp: Fix deadlock between host removal and multipathd
commit bcc05910359183b431da92713e98eed478edf83a upstream.

If scsi_remove_host() is invoked after a SCSI device has been blocked,
if the fast_io_fail_tmo or dev_loss_tmo work gets scheduled on the
workqueue executing srp_remove_work() and if an I/O request is
scheduled after the SCSI device had been blocked by e.g. multipathd
then the following deadlock can occur:

    kworker/6:1     D ffff880831f3c460     0   195      2 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff8105af6f>] msleep+0x2f/0x40
     [<ffffffff8123b0ae>] __blk_drain_queue+0x4e/0x180
     [<ffffffff8123d2d5>] blk_cleanup_queue+0x225/0x230
     [<ffffffffa0010732>] __scsi_remove_device+0x62/0xe0 [scsi_mod]
     [<ffffffffa000ed2f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
     [<ffffffffa0002eba>] scsi_remove_host+0x7a/0x130 [scsi_mod]
     [<ffffffffa07cf5c5>] srp_remove_work+0x95/0x180 [ib_srp]
     [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
     [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
     [<ffffffff810758bd>] kthread+0xed/0x110
     [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
    multipathd      D ffff880096acc460     0  5340      1 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff814ab79b>] io_schedule_timeout+0x9b/0xf0
     [<ffffffff814abe1c>] wait_for_completion_io_timeout+0xdc/0x110
     [<ffffffff81244b9b>] blk_execute_rq+0x9b/0x100
     [<ffffffff8124f665>] sg_io+0x1a5/0x450
     [<ffffffff8124fd21>] scsi_cmd_ioctl+0x2a1/0x430
     [<ffffffff8124fef2>] scsi_cmd_blk_ioctl+0x42/0x50
     [<ffffffffa00ec97e>] sd_ioctl+0xbe/0x140 [sd_mod]
     [<ffffffff8124bd04>] blkdev_ioctl+0x234/0x840
     [<ffffffff811cb491>] block_ioctl+0x41/0x50
     [<ffffffff811a0df0>] do_vfs_ioctl+0x300/0x520
     [<ffffffff811a1051>] SyS_ioctl+0x41/0x80
     [<ffffffff814b9962>] tracesys+0xd0/0xd5

Fix this by scheduling removal work on another workqueue than the
transport layer timers.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17 09:04:02 -07:00
Steve Wise 433d80d625 RDMA/iwcm: Use a default listen backlog if needed
commit 2f0304d21867476394cd51a54e97f7273d112261 upstream.

If the user creates a listening cm_id with backlog of 0 the IWCM ends
up not allowing any connection requests at all.  The correct behavior
is for the IWCM to pick a default value if the user backlog parameter
is zero.

Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
iwarp support without this fix.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17 09:04:00 -07:00