android_kernel_lge_bullhead

Commit Graph

Author	SHA1	Message	Date
Johannes Thumshirn	c243c6149a	scsi: don't BUG_ON() empty DMA transfers commit fd3fc0b4d7305fa7246622dcc0dec69c42443f45 upstream. Don't crash the machine just because of an empty transfer. Use WARN_ON() combined with returning an error. Found by Dmitry Vyukov and syzkaller. [ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON() might still be a cause of excessive log spamming. NOTE! If this warning ever triggers, we may end up leaking resources, since this doesn't bother to try to clean the command up. So this WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is much worse. People really need to stop using BUG_ON() for "this shouldn't ever happen". It makes pretty much any bug worse. - Linus ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-20 14:03:22 +02:00
Christoph Hellwig	cf8ec2cf8d	scsi: move the nr_phys_segments assert into scsi_init_io commit 635d98b1d0cfc2ba3426a701725d31a6102c059a upstream. scsi_init_io should only be called for requests that transfer data, so move the assert that a request has segments from the callers into scsi_init_io. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-20 14:03:21 +02:00
James Bottomley	f8f3d2705f	scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream. When SCSI was written, all commands coming from the filesystem (REQ_TYPE_FS commands) had data. This meant that our signal for needing to complete the command was the number of bytes completed being equal to the number of bytes in the request. Unfortunately, with the advent of flush barriers, we can now get zero length REQ_TYPE_FS commands, which confuse this logic because they satisfy the condition every time. This means they never get retried even for retryable conditions, like UNIT ATTENTION because we complete them early assuming they're done. Fix this by special casing the early completion condition to recognise zero length commands with errors and let them drop through to the retry code. Reported-by: Sebastian Parschauer <s.parschauer@gmx.de> Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Tested-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> [ jwang: backport from upstream 4.7 to fix scsi resize issue ] Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-27 11:40:34 +02:00
Christoph Hellwig	75b319d686	scsi: remove scsi_end_request commit bc85dc500f9df9b2eec15077e5046672c46adeaa upstream. By folding scsi_end_request into its only caller we can significantly clean up the completion logic. We can use simple goto labels now to only have a single place to finish or requeue command there instead of the previous convoluted logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> [jwang: backport to 3.12] Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-27 11:40:29 +02:00
Bart Van Assche	3e01cca39c	Defer processing of REQ_PREEMPT requests for blocked devices commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b upstream. SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-19 10:10:49 +02:00
James Bottomley	73e586351a	scsi: handle flush errors properly commit 89fb4cd1f717a871ef79fa7debbe840e3225cd54 upstream. Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Reported-by: Steven Haber <steven@qumulo.com> Tested-by: Steven Haber <steven@qumulo.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-08-07 14:30:25 -07:00
Lin Ming	9b21493c45	[SCSI] sd: use REQ_PM in sd's runtime suspend operation With the introduction of REQ_PM, modify sd's runtime suspend operation functions to use that flag so that the operations to put the device into runtime suspended state(i.e. sync cache and stop device) will not affect its runtime PM status. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-05-06 12:48:17 -07:00
Rafael J. Wysocki	53540098b2	ACPI / glue: Add .match() callback to struct acpi_bus_type USB uses the .find_bridge() callback from struct acpi_bus_type incorrectly, because as a result of the way it is used by USB every device in the system that doesn't have a bus type or parent is passed to usb_acpi_find_device() for inspection. What USB actually needs, though, is to call usb_acpi_find_device() for USB ports that don't have a bus type defined, but have usb_port_device_type as their device type, as well as for USB devices. To fix that replace the struct bus_type pointer in struct acpi_bus_type used for matching devices to specific subsystems with a .match() callback to be used for this purpose and update the users of struct acpi_bus_type, including USB, accordingly. Define the .match() callback routine for USB, usb_acpi_bus_match(), in such a way that it will cover both USB devices and USB ports and remove the now redundant .find_bridge() callback pointer from usb_acpi_bus. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>	2013-03-04 14:23:40 +01:00
Aaron Lu	6f4c827e68	[libata] scsi: no poll when ODD is powered off When the ODD is powered off, any action the user did to the ODD that would generate a media event will trigger an ACPI interrupt, so the poll for media event is no longer necessary. And the poll will also cause a runtime status change, which will stop the ODD from staying in powered off state, so the poll should better be stopped. But since we don't have access to the gendisk structure in LLDs, here comes the disk_events_disable_depth for scsi device. This field is a hint set by LLDs to convey information to upper layer drivers. A value of 0 means media poll is necessary for the device, while values above 0 means media poll is not needed and should better be skipped. So we can increase its value when we are to power off the ODD in ATA layer and decrease its value when the ODD is powered on, effectively silence the media events poll. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2013-01-25 15:36:43 -05:00
Linus Torvalds	60da5bf47d	Merge branch 'for-3.8/core' of git://git.kernel.dk/linux-block Pull block layer core updates from Jens Axboe: "Here are the core block IO bits for 3.8. The branch contains: - The final version of the surprise device removal fixups from Bart. - Don't hide EFI partitions under advanced partition types. It's fairly wide spread these days. This is especially dangerous for systems that have both msdos and efi partition tables, where you want to keep them in sync. - Cleanup of using -1 instead of the proper NUMA_NO_NODE - Export control of bdi flusher thread CPU mask and default to using the home node (if known) from Jeff. - Export unplug tracepoint for MD. - Core improvements from Shaohua. Reinstate the recursive merge, as the original bug has been fixed. Add plugging for discard and also fix a problem handling non pow-of-2 discard limits. There's a trivial merge in block/blk-exec.c due to a fix that went into 3.7-rc at a later point than -rc4 where this is based." * 'for-3.8/core' of git://git.kernel.dk/linux-block: block: export block_unplug tracepoint block: add plug for blkdev_issue_discard block: discard granularity might not be power of 2 deadline: Allow 0ms deadline latency, increase the read speed partitions: enable EFI/GPT support by default bsg: Remove unused function bsg_goose_queue() block: Make blk_cleanup_queue() wait until request_fn finished block: Avoid scheduling delayed work on a dead queue block: Avoid that request_fn is invoked on a dead queue block: Let blk_drain_queue() caller obtain the queue lock block: Rename queue dead flag bdi: add a user-tunable cpu_list for the bdi flusher threads block: use NUMA_NO_NODE instead of -1 block: recursive merge requests block CFQ: avoid moving request to different queue	2012-12-17 08:27:23 -08:00
Bart Van Assche	3f3299d5c0	block: Rename queue dead flag QUEUE_FLAG_DEAD is used to indicate that queuing new requests must stop. After this flag has been set queue draining starts. However, during the queue draining phase it is still safe to invoke the queue's request_fn, so QUEUE_FLAG_DYING is a better name for this flag. This patch has been generated by running the following command over the kernel source tree: git grep -lEw 'blk_queue_dead\|QUEUE_FLAG_DEAD' \| xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g' \ -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g'; \ sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \ include/linux/blkdev.h; \ sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \ -e 's/Dead queue/A dying queue/' block/blk-core.c Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <axboe@kernel.dk> Cc: Chanho Min <chanho.min@lge.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-12-06 14:30:58 +01:00
Martin K. Petersen	5db44863b6	[SCSI] sd: Implement support for WRITE SAME Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk driver. - We set the default maximum to 0xFFFF because there are several devices out there that only support two-byte block counts even with WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK LIMITS VPD. - max_write_same_blocks can be overriden per-device basis in sysfs. - The UNMAP discovery heuristics remain unchanged but the discard limits are tweaked to match the "real" WRITE SAME commands. - In the error handling logic we now distinguish between WRITE SAME with and without UNMAP set. The discovery process heuristics are: - If the device reports a SCSI level of SPC-3 or greater we'll issue READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is supported. If that's the case we will use it. - If the device supports the block limits VPD and reports a MAXIMUM WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16). - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond 0xFFFFFFFF or the block count exceeds 0xFFFF. - no_write_same is set for ATA, FireWire and USB. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-11-13 22:45:42 -08:00
James Bottomley	fe709ed827	Merge SCSI misc branch into isci-for-3.6 tag	2012-10-02 08:55:12 +01:00
Vikas Chaudhary	0e58076b37	[SCSI] scsi_lib: Set the device state from transport-offline to running FC and iSCSI class set SCSI devices to transport-offline state after fast_io_fail/replacement_timeout has fired, but after relogin, function scsi_internal_device_unblock() is not setting scsi device state to running. Due to this the devices even after being relogged in remain offline. Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-09-14 17:59:22 +01:00
Mike Snitzer	27c419739b	[SCSI] scsi_lib: fix scsi_io_completion's SG_IO error propagation The following v3.4-rc1 commit unmasked an existing bug in scsi_io_completion's SG_IO error handling: `47ac56d` [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Given that certain ILLEGAL_REQUEST are now properly categorized as TARGET_ERROR the host_byte is being set (before host_byte wasn't ever set for these ILLEGAL_REQUEST). In scsi_io_completion, initialize req->errors with cmd->result _after_ the SG_IO block that calls __scsi_error_from_host_byte (which may modify the host_byte). Before this fix: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0x10, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: sense buffer empty After: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid field in cdb Raw sense data (in hex): 70 00 05 00 00 00 00 0b 00 00 00 00 24 00 00 00 00 00 00 Reported-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Babu Moger <babu.moger@netapp.com> Cc: stable@vger.kernel.org # 3.4 Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-08-22 09:42:31 +04:00
Jeff Garzik	8407884dd9	Merge branch 'master' [vanilla Linus master] into libata-dev.git/upstream Two bits were appended to the end of the bitfield list in struct scsi_device. Resolve that conflict by including both bits. Conflicts: include/scsi/scsi_device.h	2012-07-25 15:58:48 -04:00
Bart Van Assche	b485462aca	[SCSI] Stop accepting SCSI requests before removing a device Avoid that the code for requeueing SCSI requests triggers a crash by making sure that that code isn't scheduled anymore after a device has been removed. Also, source code inspection of __scsi_remove_device() revealed a race condition in this function: no new SCSI requests must be accepted for a SCSI device after device removal started. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:41 +01:00
Bart Van Assche	84feb1664e	[SCSI] Change return type of scsi_queue_insert() into void The return value of scsi_queue_insert() is ignored by all its callers, hence change the return type of this function into void. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:41 +01:00
Bart Van Assche	940f5d47e2	[SCSI] Avoid dangling pointer in scsi_requeue_command() When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:40 +01:00
Bart Van Assche	67bd941300	[SCSI] Fix device removal NULL pointer dereference Use blk_queue_dead() to test whether the queue is dead instead of !sdev. Since scsi_prep_fn() may be invoked concurrently with __scsi_remove_device(), keep the queuedata (sdev) pointer in __scsi_remove_device(). This patch fixes a kernel oops that can be triggered by USB device removal. See also http://www.spinics.net/lists/linux-scsi/msg56254.html. Other changes included in this patch: - Swap the blk_cleanup_queue() and kfree() calls in scsi_host_dev_release() to make that code easier to grasp. - Remove the queue dead check from scsi_run_queue() since the queue state can change anyway at any point in that function where the queue lock is not held. - Remove the queue dead check from the start of scsi_request_fn() since it is redundant with the scsi_device_online() check. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:40 +01:00
Mike Christie	d075498c98	[SCSI] remove old comment from block/unblock functions We do not hold the host lock when calling these functions, so remove comment. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:23 +01:00
Mike Christie	5d9fb5cc1b	[SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new state This has scsi_internal_device_unblock/scsi_target_unblock take the new state to set the devices as an argument instead of always setting to running. The patch also converts users of these functions. This allows the FC and iSCSI class to transition devices from blocked to transport-offline, so that when fast_io_fail/replacement_timeout has fired we do not set the devices back to running. Instead, we set them to SDEV_TRANSPORT_OFFLINE. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:22 +01:00
Mike Christie	1b8d262061	[SCSI] add new SDEV_TRANSPORT_OFFLINE state This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will be used by transport classes to offline devices for cases like when the fast_io_fail/recovery_tmo fires. In those cases we want all IO to fail, and we have not yet escalated to dev_loss_tmo behavior where we are removing the devices. Currently to handle this state, transport classes are setting the scsi_device's state to running, setting their internal session/port structs state to something that indicates failed, and then failing IO from some transport check in the queuecommand. The reason for the new value is so that users can distinguish between a device failure that is a result of a transport problem vs the wide range of errors that devices get offlined for when a scsi command times out and we offline the devices there. It also fixes the confusion as to why the transport class is failing IO, but has set the device state from blocked to running. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:21 +01:00
Holger Macht	de50ada55b	[SCSI] add wrapper to access and set scsi_bus_type in struct acpi_bus_type For being able to bind ata devices against acpi devices, scsi_bus_type needs to be set as bus in struct acpi_bus_type. So add wrapper to scsi_lib to accomplish that. Signed-off-by: Holger Macht <holger@homac.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2012-06-29 11:38:09 -04:00
Jun'ichi Nomura	b7e94a1686	[SCSI] Fix dm-multipath starvation when scsi host is busy block congestion control doesn't have any concept of fairness across multiple queues. This means that if SCSI reports the host as busy in the queue congestion control it can result in an unfair starvation situation in dm-mp if there are multiple multipath devices on the same host. For example: http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html The fix for this is to report only the sdev busy state (and ignore the host busy state) in the block congestion control call back. The host is still congested, but the SCSI subsystem will sort out the congestion in a fair way because it knows the relation between the queues and the host. [jejb: fixed up trailing whitespace] Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-05-23 09:34:17 +01:00
James Bottomley	e346933365	isci update for 3.5 1/ Rework remote-node-context (RNC) handling for proper management of the silicon state machine in error handling and hot-plug conditions. Further details below, suffice to say if the RNC is mismanaged the silicon state machines may lock up. 2/ Refactor the initialization code to be reused for suspend/resume support 3/ Miscellaneous bug fixes to address discovery issues and hardware compatibility. RNC rework details from Jeff Skirvin: In the controller, devices as they appear on a SAS domain (or direct-attached SATA devices) are represented by memory structures known as "Remote Node Contexts" (RNCs). These structures are transferred from main memory to the controller using a set of register commands; these commands include setting up the context ("posting"), removing the context ("invalidating"), and commands to control the scheduling of commands and connections to that remote device ("suspensions" and "resumptions"). There is a similar path to control RNC scheduling from the protocol engine, which interprets the results of command and data transmission and reception. In general, the controller chooses among non-suspended RNCs to find one that has work requiring scheduling the transmission of command and data frames to a target. Likewise, when a target tries to return data back to the initiator, the state of the RNC is used by the controller to determine how to treat the incoming request. As an example, if the RNC is in the state "TX/RX Suspended", incoming SSP connection requests from the target will be rejected by the controller hardware. When an RNC is "TX Suspended", it will not be selected by the controller hardware to start outgoing command or data operations (with certain priority-based exceptions). As mentioned above, there are two sources for management of the RNC states: commands from driver software, and the result of transmission and reception conditions of commands and data signaled by the controller hardware. As an example of the latter, if an outgoing SSP command ends with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will transition to the "TX Suspended" state, and this is signaled by the controller hardware in the status to the completion of the pending command as well as signaled in a controller hardware event. Examples of the former are included in the patch changelogs. Driver software is required to suspend the RNC in a "TX/RX Suspended" condition before any outstanding commands can be terminated. Failure to guarantee this can lead to a complete hardware hang condition. Earlier versions of the driver software did not guarantee that an RNC was correctly managed before I/O termination, and so operated in an unsafe way. Further, the driver performed unnecessary contortions to preserve the remote device command state and so was more complicated than it needed to be. A simplifying driver assumption is that once an I/O has entered the error handler path without having completed in the target, the requirement on the driver is that all use of the sas_task must end. Beyond that, recovery of operation is dependent on libsas and other components to reset, rediscover and reconfigure the device before normal operation can restart. In the driver, this simplifying assumption meant that the RNC management could be reduced to entry into the suspended state, terminating the targeted I/O request, and resuming the RNC as needed for device-specific management such as an SSP Abort Task or LUN Reset Management request. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPtYFXAAoJEB7SkWpmfYgCJkcP/1VvsuuitNy/YM9P1tb/RvQ7 ytJzjGtWiAABHVWwjgB+Ng7hUTaP2r6l8KeNfwxpwXyNdBAUNysYEUBHAfPsKzKz espTmw3wVCnREajgKXZwFp9aTj8DcYFB6vKcC/ddACt3uRNjjA9En36+6797r8Vg YdebyjFX2FxwoUj0icTUiV/OXgb8w723imnCl8bfOhhFRi4eFZ4EJ23AdMkUya1i uYePAPJPSQJuU/87gNIx4JcR0qHJ1ziGPEY+XC47CzEeXbBTSPgWOwanQ6KPoXRJ XVxamfcKAjRdtwQ4m1vYBSE32RTdrjhujVbkGiPi6QaEbCLjhLSCIYyuS3XMckV+ TCZ16o5kd/I6ZtZOeP4zZRGnNBkOPzY44qiJeKffjWDhTrFacx4XWJB/ftWPgEA5 N2zFH3RM4sY0FUJ3I/Qe5CERNdCXMtcj+UAf3nHpAIVcv46Lp+qoSkdEx05uuuiN +D/dSlubktuvuzmB5WisL3qrjNEkkLTAGQpZs1j0ojLEBm0XAgV5EzqmHiZ0GOPD OQNFxeei9SlqgtKIIP0bymRispPrG2HVCOvExYMxzKR6fjxofZLAs/aWOsdhxgMq TlAyZJ6OmGI+KX68HHzoMpT9iquvmP64WGkfHzCx296BfSKiruLh/Jzt5gGwv+Z1 5tlpnUr9dUTxx7qkQXvj =HYvO -----END PGP SIGNATURE----- Merge tag 'isci-for-3.5' into misc isci update for 3.5 1/ Rework remote-node-context (RNC) handling for proper management of the silicon state machine in error handling and hot-plug conditions. Further details below, suffice to say if the RNC is mismanaged the silicon state machines may lock up. 2/ Refactor the initialization code to be reused for suspend/resume support 3/ Miscellaneous bug fixes to address discovery issues and hardware compatibility. RNC rework details from Jeff Skirvin: In the controller, devices as they appear on a SAS domain (or direct-attached SATA devices) are represented by memory structures known as "Remote Node Contexts" (RNCs). These structures are transferred from main memory to the controller using a set of register commands; these commands include setting up the context ("posting"), removing the context ("invalidating"), and commands to control the scheduling of commands and connections to that remote device ("suspensions" and "resumptions"). There is a similar path to control RNC scheduling from the protocol engine, which interprets the results of command and data transmission and reception. In general, the controller chooses among non-suspended RNCs to find one that has work requiring scheduling the transmission of command and data frames to a target. Likewise, when a target tries to return data back to the initiator, the state of the RNC is used by the controller to determine how to treat the incoming request. As an example, if the RNC is in the state "TX/RX Suspended", incoming SSP connection requests from the target will be rejected by the controller hardware. When an RNC is "TX Suspended", it will not be selected by the controller hardware to start outgoing command or data operations (with certain priority-based exceptions). As mentioned above, there are two sources for management of the RNC states: commands from driver software, and the result of transmission and reception conditions of commands and data signaled by the controller hardware. As an example of the latter, if an outgoing SSP command ends with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will transition to the "TX Suspended" state, and this is signaled by the controller hardware in the status to the completion of the pending command as well as signaled in a controller hardware event. Examples of the former are included in the patch changelogs. Driver software is required to suspend the RNC in a "TX/RX Suspended" condition before any outstanding commands can be terminated. Failure to guarantee this can lead to a complete hardware hang condition. Earlier versions of the driver software did not guarantee that an RNC was correctly managed before I/O termination, and so operated in an unsafe way. Further, the driver performed unnecessary contortions to preserve the remote device command state and so was more complicated than it needed to be. A simplifying driver assumption is that once an I/O has entered the error handler path without having completed in the target, the requirement on the driver is that all use of the sas_task must end. Beyond that, recovery of operation is dependent on libsas and other components to reset, rediscover and reconfigure the device before normal operation can restart. In the driver, this simplifying assumption meant that the RNC management could be reduced to entry into the suspended state, terminating the targeted I/O request, and resuming the RNC as needed for device-specific management such as an SSP Abort Task or LUN Reset Management request.	2012-05-21 12:17:30 +01:00
Dan Williams	a7a20d1039	[SCSI] sd: limit the scope of the async probe domain sd injects and synchronizes probe work on the global kernel-wide domain. This runs into conflict with PM that wants to perform resume actions in async context: [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 [ 494.611685] Call Trace: [ 494.632649] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 494.674687] [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112 [ 494.734177] [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50 [ 494.787134] [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48 [ 494.837900] [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17 [ 494.891567] [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete [ 494.943783] [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a [ 495.002547] [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod] [ 495.051861] [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf [ 495.104807] [<ffffffff812fe9bd>] device_release_driver+0x25/0x32 <-- here we take device_lock() [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 [ 853.886633] Call Trace: [ 853.907631] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 853.949670] [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351 [ 854.001225] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.049082] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.097011] [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() [ 854.145591] [<ffffffff81304bd7>] device_resume+0x58/0x1c4 [ 854.192066] [<ffffffff81304d61>] async_resume+0x1e/0x45 [ 854.237019] [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context Provide a 'scsi_sd_probe_domain' so that async probe actions actions can be flushed without regard for the state of PM, and allow for the resume path to handle devices that have transitioned from SDEV_QUIESCE to SDEV_DEL prior to resume. Acked-by: Alan Stern <stern@rowland.harvard.edu> [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams <dan.j.williams@intel.com> [jejb: remove unneeded config guards in include file] Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-05-17 09:10:46 +01:00
Lin Ming	6f381fa344	[SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue Currently, __scsi_alloc_queue uses SCSI host's parent device as DMA device to set segment boundary. But the parent device may not refer to the DMA device. For example, for ATA disk, SCSI host's parent device now refers to ATA port. Since commit d139b9b([SCSI] scsi_lib_dma: fix bug with dma maps on nested scsi objects), a new field Scsi_Host->dma_dev was introduced to refer to the real DMA device. Use ->dma_dev in __scsi_alloc_queue to correctly set segment boundary. Bug report: http://marc.info/?l=linux-ide&m=133177818318187&w=2 Reported-and-tested-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-04-22 18:56:18 +01:00
Linus Torvalds	424a6f6ef9	SCSI updates on 20120319 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE= =1TxB -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 SCSI updates from James Bottomley: "The update includes the usual assortment of driver updates (lpfc, qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge amount of infrastructure work in the SAS library and transport class as well as an iSCSI update. There's also a new SCSI based virtio driver." * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits) [SCSI] qla4xxx: Update driver version to 5.02.00-k15 [SCSI] qla4xxx: trivial cleanup [SCSI] qla4xxx: Fix sparse warning [SCSI] qla4xxx: Add support for multiple session per host. [SCSI] qla4xxx: Export CHAP index as sysfs attribute [SCSI] scsi_transport: Export CHAP index as sysfs attribute [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry [SCSI] pm8001: fix endian issue with code optimization. [SCSI] pm8001: Fix possible racing condition. [SCSI] pm8001: Fix bogus interrupt state flag issue. [SCSI] ipr: update PCI ID definitions for new adapters [SCSI] qla2xxx: handle default case in qla2x00_request_firmware() [SCSI] isci: improvements in driver unloading routine [SCSI] isci: improve phy event warnings [SCSI] isci: debug, provide state-enum-to-string conversions [SCSI] scsi_transport_sas: 'enable' phys on reset [SCSI] libsas: don't recover end devices attached to disabled phys [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata [SCSI] libsas: set attached device type and target protocols for local phys ...	2012-03-22 12:55:29 -07:00
Cong Wang	77dfce076c	scsi: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:19 +08:00
Martin K. Petersen	66a651aa7a	[SCSI] Ensure discard failure gets treated as a target problem The error reported up the stack for a discard failure did not clearly indicate that the command was processed and subsequently failed by the target device. Return -EREMOTEIO so multipathing does not classify this condition as a path failure. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-02-19 09:38:01 -06:00
Moger, Babu	2082ebc45a	[SCSI] fix the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE) This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset the host byte to DID_OK. But that does not happen because of the OR operation. Here is the flow. scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, result will be set as DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This patch fixes this issue. Signed-off-by: Babu Moger <babu.moger@netapp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-02-19 08:08:59 -06:00
Shaohua Li	466c08c71a	[SCSI] don't change sdev starvation list order without request dispatched The sdev is deleted from starved list and then try to dispatch from this device. It's quite possible the sdev can't eventually dispatch a request, then the sdev will be in starved list tail. This isn't fair. There are two cases here: 1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then the dev is removed from starved list, but quite possible host queue isn't ready, the dev is moved to starved list without dispatching any request. 2. scsi_run_queue path. It deletes the dev from starved list first (both global and local starved lists), then handles the dev. Then we could have the same process like case 1. This patch fixes the first case. Case 2 isn't fixed, because there is a rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds host is busy (other CPU is faster to get host queue depth). Not deleting the dev from starved list in scsi_run_queue will keep scsi_run_queue looping (though this is very rare case, because host will become busy). Fortunately fixing case 1 already gives big improvement for starvation in my test. In a 12 disk JBOD setup, running file creation under EXT4, this gives 12% more throughput. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-01-16 11:54:04 +04:00
Hannes Reinecke	745718132c	[SCSI] Silencing 'killing requests for dead queue' When we tear down a device we try to flush all outstanding commands in scsi_free_queue(). However the check in scsi_request_fn() is imperfect as it only signals that we _might start_ aborting commands, not that we've actually aborted some. So move the printk inside the scsi_kill_request function, this will also give us a hint about which commands are aborted. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-11-09 12:07:07 -06:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Paul Gortmaker	09703660ed	scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required For the basic SCSI infrastructure files that are exporting symbols but not modules themselves, add in the basic export.h header file to allow the exports. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:23 -04:00
Bart Van Assche	3308511c93	[SCSI] Make scsi_free_queue() kill pending SCSI commands Make sure that SCSI device removal via scsi_remove_host() does finish all pending SCSI commands. Currently that's not the case and hence removal of a SCSI host during I/O can cause a deadlock. See also "blkdev_issue_discard() hangs forever if underlying storage device is removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also http://lkml.org/lkml/2011/8/27/6. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-30 13:20:28 +04:00
James Smart	573e591353	[SCSI] scsi_lib: pause between error retries During cable pull tests on our 16G FC adapter, we are seeing errors, typically reads to close targets, which fail due to CRC or framing errors caused by the cable being pull (return status DID_ERROR). The adapter detects the error on one of the first frames received, marks the FC exchange as dead (further frames go to bit bucket) and signals the host of the error. This action is so quick, and coupled with fast host CPUs, creates a scenario in which the midlayer sees the failure and retries the io almost immediately. We've seen link traces with the retry on the link while the original i/o is still being processed by the target. We're also seeing the time window for the "link to pull-apart" and the physical interface to report disconnected to be in the few millisecond range. Which means, we're encountering scenarios where the full retry count is exhausted (all with error) by the midlayer before the link disconnect state is detected. We looked at 8G FC behavior and occasionally see the same behavior, but as the link was slower, it rarely could exhaust all retries before the link reported disconnect. What is needed is a slight delay between io retries due to DID_ERROR to cover this error. It is inappropriate to put this delay in the driver, as the error is indistinguishable from other link-related errors, nor does the driver track whether the io is a retry or not. This is also easier than tracking between-io-error bursts that are seen in this scenario. The patch below updates the retry path so that it inserts a delay as if the target was busy. The busy delay is on the order of 6ms. This delay is sufficient to ensure the link down condition is reported before the retry count is exhausted (at most 1 retry is seen). Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-07-27 14:06:01 +04:00
James Bottomley	bfe159a512	[SCSI] fix crash in scsi_dispatch_cmd() USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-07-21 14:21:18 -07:00
Linus Torvalds	a2b9c1f620	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: don't delay blk_run_queue_async scsi: remove performance regression due to async queue run blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup block: rescan partitions on invalidated devices on -ENOMEDIA too cdrom: always check_disk_change() on open block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers	2011-05-18 06:49:02 -07:00
Jens Axboe	9937a5e2f3	scsi: remove performance regression due to async queue run Commit `c21e6beb` removed our queue request_fn re-enter protection, and defaulted to always running the queues from kblockd to be safe. This was a known potential slow down, but should be safe. Unfortunately this is causing big performance regressions for some, so we need to improve this logic. Looking into the details of the re-enter, the real issue is on requeue of requests. Requeue of requests upon seeing a BUSY condition from the device ends up re-running the queue, causing traces like this: scsi_request_fn() scsi_dispatch_cmd() scsi_queue_insert() __scsi_queue_insert() scsi_run_queue() scsi_request_fn() ... potentially causing the issue we want to avoid. So special case the requeue re-run of the queue, but improve it to offload the entire run of local queue and starved queue from a single workqueue callback. This is a lot better than potentially kicking off a workqueue run for each device seen. This also fixes the issue of the local device going into recursion, since the above mentioned commit never moved that queue run out of line. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-05-17 11:04:44 +02:00
James Bottomley	c055f5b261	[SCSI] fix oops in scsi_run_queue() The recent commit closing the race window in device teardown: commit `86cbfb5607` Author: James Bottomley <James.Bottomley@suse.de> Date: Fri Apr 22 10:39:59 2011 -0500 [SCSI] put stricter guards on queue dead checks is causing a potential NULL deref in scsi_run_queue() because the q->queuedata may already be NULL by the time this function is called. Since we shouldn't be running a queue that is being torn down, simply add a NULL check in scsi_run_queue() to forestall this. Tested-by: Jim Schutt <jaschut@sandia.gov> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-05-03 15:30:00 -05:00
Jens Axboe	c21e6beba8	block: get rid of QUEUE_FLAG_REENTER We are currently using this flag to check whether it's safe to call into ->request_fn(). If it is set, we punt to kblockd. But we get a lot of false positives and excessive punts to kblockd, which hurts performance. The only real abuser of this infrastructure is SCSI. So export the async queue run and convert SCSI over to use that. There's room for improvement in that SCSI need not always use the async call, but this fixes our performance issue and they can fix that up in due time. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-19 13:32:46 +02:00
Christoph Hellwig	24ecfbe27f	block: add blk_run_queue_async Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-18 11:41:33 +02:00
Linus Torvalds	6c51038900	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}	2011-03-24 10:16:26 -07:00
Linus Torvalds	c55d267de2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}	2011-03-17 17:54:40 -07:00
Martin K. Petersen	c98a0eb0e9	[SCSI] sd: Logical Block Provisioning update SBC3r26 contains many changes to the Logical Block Provisioning interfaces (formerly known as Thin Provisioning ditto). This patch implements support for both the old and new schemes using the same heuristic as before (whether the LBP VPD page is present). The new code also allows the provisioning mode (i.e. choice of command) to be overridden on a per-device basis via sysfs. Two additional modes are supported in this version: - WRITE SAME(10) with the UNMAP bit set - WRITE SAME(10) without the UNMAP bit set. This allows us to support devices that predate the TP/LBP enhancements in SBC3 and which work by way zero-detection Switching between modes has been consolidated in a helper function that also updates the block layer topology according to the limitations of the chosen command. I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10) if WRITE SAME(16) fails, etc. but found several devices that got cranky. So for now we'll disable discard if one of the commands fail. The user still has the option of selecting a different mode in sysfs. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-03-14 18:37:34 -05:00
Martin K. Petersen	72f7d322fd	[SCSI] Include protection operation in SCSI command trace When debugging DIF/DIX it is very helpful to be able to see which DIX operation is associated with the scsi_cmnd. Include the protection op in the SCSI command trace. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-03-14 18:36:02 -05:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	a488e74976	scsi: convert to blk_delay_queue() It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:45:54 +01:00

1 2 3 4 5 ...

281 Commits