From f78cff48c3d13636705851b0a9e733db117d58ee Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:38 -0800 Subject: [PATCH 01/14] doc: Add x86 CPU0 online/offline feature If CONFIG_BOOTPARAM_HOTPLUG_CPU0 is turned on, CPU0 is hotpluggable. Otherwise, by default CPU0 is not hotpluggable and kernel parameter cpu0_hotplug enables CPU0 online/offline feature. The documentations point out two known CPU0 dependencies. First, resume from hibernate or suspend always starts from CPU0. So hibernate and suspend are prevented if CPU0 is offline. Another dependency is PIC interrupts always go to CPU0. It's said that some machines may depend on CPU0 to poweroff/reboot. But I haven't seen such dependency on a few tested machines. Please let me know if you see any CPU0 dependencies on your machine. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-2-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- Documentation/cpu-hotplug.txt | 24 ++++++++++++++++++++++++ Documentation/kernel-parameters.txt | 14 ++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 66ef8f35613..9f401350f50 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -207,6 +207,30 @@ by making it not-removable. In such cases you will also notice that the online file is missing under cpu0. +Q: Is CPU0 removable on X86? +A: Yes. If kernel is compiled with CONFIG_BOOTPARAM_HOTPLUG_CPU0=y, CPU0 is +removable by default. Otherwise, CPU0 is also removable by kernel option +cpu0_hotplug. + +But some features depend on CPU0. Two known dependencies are: + +1. Resume from hibernate/suspend depends on CPU0. Hibernate/suspend will fail if +CPU0 is offline and you need to online CPU0 before hibernate/suspend can +continue. +2. PIC interrupts also depend on CPU0. CPU0 can't be removed if a PIC interrupt +is detected. + +It's said poweroff/reboot may depend on CPU0 on some machines although I haven't +seen any poweroff/reboot failure so far after CPU0 is offline on a few tested +machines. + +Please let me know if you know or see any other dependencies of CPU0. + +If the dependencies are under your control, you can turn on CPU0 hotplug feature +either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug. + +--Fenghua Yu + Q: How do i find out if a particular CPU is not removable? A: Depending on the implementation, some architectures may show this by the absence of the "online" file. This is done if it can be determined ahead of diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9776f068306..f7cbe1d98cb 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1984,6 +1984,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nox2apic [X86-64,APIC] Do not enable x2APIC mode. + cpu0_hotplug [X86] Turn on CPU0 hotplug feature when + CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off. + Some features depend on CPU0. Known dependencies are: + 1. Resume from suspend/hibernate depends on CPU0. + Suspend/hibernate will fail if CPU0 is offline and you + need to online CPU0 before suspend/hibernate. + 2. PIC interrupts also depend on CPU0. CPU0 can't be + removed if a PIC interrupt is detected. + It's said poweroff/reboot may depend on CPU0 on some + machines although I haven't seen such issues so far + after CPU0 is offline on a few tested machines. + If the dependencies are under your control, you can + turn on cpu0_hotplug. + nptcg= [IA-64] Override max number of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or SAL PALO. From 80aa1dff65717e7518647d4e27d1d3dcea5818e6 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:39 -0800 Subject: [PATCH 02/14] x86, Kconfig: Add config switch for CPU0 hotplug New config switch CONFIG_BOOTPARAM_HOTPLUG_CPU0 sets default state of whether the CPU0 hotplug is on or off. If the switch is off, CPU0 is not hotpluggable by default. But the CPU0 hotplug feature can still be turned on by kernel parameter cpu0_hotplug at boot. If the switch is on, CPU0 is always hotpluggable. The default value of the switch is off. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/Kconfig | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 46c3bff3ced..036e89ab470 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1698,6 +1698,35 @@ config HOTPLUG_CPU automatically on SMP systems. ) Say N if you want to disable CPU hotplug. +config BOOTPARAM_HOTPLUG_CPU0 + bool "Set default setting of cpu0_hotpluggable" + default n + depends on HOTPLUG_CPU && EXPERIMENTAL + ---help--- + Set whether default state of cpu0_hotpluggable is on or off. + + Say Y here to enable CPU0 hotplug by default. If this switch + is turned on, there is no need to give cpu0_hotplug kernel + parameter and the CPU0 hotplug feature is enabled by default. + + Please note: there are two known CPU0 dependencies if you want + to enable the CPU0 hotplug feature either by this switch or by + cpu0_hotplug kernel parameter. + + First, resume from hibernate or suspend always starts from CPU0. + So hibernate and suspend are prevented if CPU0 is offline. + + Second dependency is PIC interrupts always go to CPU0. CPU0 can not + offline if any interrupt can not migrate out of CPU0. There may + be other CPU0 dependencies. + + Please make sure the dependencies are under your control before + you enable this feature. + + Say N if you don't want to enable CPU0 hotplug feature by default. + You still can enable the CPU0 hotplug feature at boot by kernel + parameter cpu0_hotplug. + config COMPAT_VDSO def_bool y prompt "Compat VDSO support" From 4d25031a81d3cd32edc00de6596db76cc4010685 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:40 -0800 Subject: [PATCH 03/14] x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is enabled by default. If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug feature is not enabled by default. The kernel parameter cpu0_hotplug can enable CPU0 hotplug feature at boot. Currently the feature is supported on Intel platforms only. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-4-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/topology.c | 52 ++++++++++++++++++++++++++++++++------ 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c index 76ee97709a0..0e7b4a7a7fb 100644 --- a/arch/x86/kernel/topology.c +++ b/arch/x86/kernel/topology.c @@ -30,23 +30,59 @@ #include #include #include +#include #include static DEFINE_PER_CPU(struct x86_cpu, cpu_devices); #ifdef CONFIG_HOTPLUG_CPU + +#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0 +static int cpu0_hotpluggable = 1; +#else +static int cpu0_hotpluggable; +static int __init enable_cpu0_hotplug(char *str) +{ + cpu0_hotpluggable = 1; + return 1; +} + +__setup("cpu0_hotplug", enable_cpu0_hotplug); +#endif + int __ref arch_register_cpu(int num) { + struct cpuinfo_x86 *c = &cpu_data(num); + /* - * CPU0 cannot be offlined due to several - * restrictions and assumptions in kernel. This basically - * doesn't add a control file, one cannot attempt to offline - * BSP. - * - * Also certain PCI quirks require not to enable hotplug control - * for all CPU's. + * Currently CPU0 is only hotpluggable on Intel platforms. Other + * vendors can add hotplug support later. */ - if (num) + if (c->x86_vendor != X86_VENDOR_INTEL) + cpu0_hotpluggable = 0; + + /* + * Two known BSP/CPU0 dependencies: Resume from suspend/hibernate + * depends on BSP. PIC interrupts depend on BSP. + * + * If the BSP depencies are under control, one can tell kernel to + * enable BSP hotplug. This basically adds a control file and + * one can attempt to offline BSP. + */ + if (num == 0 && cpu0_hotpluggable) { + unsigned int irq; + /* + * We won't take down the boot processor on i386 if some + * interrupts only are able to be serviced by the BSP in PIC. + */ + for_each_active_irq(irq) { + if (!IO_APIC_IRQ(irq) && irq_has_action(irq)) { + cpu0_hotpluggable = 0; + break; + } + } + } + if (num || cpu0_hotpluggable) per_cpu(cpu_devices, num).cpu.hotpluggable = 1; return register_cpu(&per_cpu(cpu_devices, num).cpu, num); From 30106c174311b8cfaaa3186c7f6f9c36c62d17da Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:41 -0800 Subject: [PATCH 04/14] x86, hotplug: Support functions for CPU0 online/offline Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after it's offline. Continue to online CPU0 in native_cpu_up(). Continue to offline CPU0 in native_cpu_disable(). Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/smpboot.c | 38 ++++++++++++++++++-------------------- 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4f19a152603..b073aaea747 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); +void smp_store_boot_cpu_info(void); void smp_store_cpu_info(int id); #define cpu_physical_id(cpu) per_cpu(x86_cpu_to_apicid, cpu) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c80a33bc528..c297907f3c7 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -125,8 +125,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info); atomic_t init_deasserted; /* - * Report back to the Boot Processor. - * Running on AP. + * Report back to the Boot Processor during boot time or to the caller processor + * during CPU online. */ static void __cpuinit smp_callin(void) { @@ -279,19 +279,30 @@ notrace static void __cpuinit start_secondary(void *unused) cpu_idle(); } +void __init smp_store_boot_cpu_info(void) +{ + int id = 0; /* CPU 0 */ + struct cpuinfo_x86 *c = &cpu_data(id); + + *c = boot_cpu_data; + c->cpu_index = id; +} + /* * The bootstrap kernel entry code has set these up. Save them for * a given CPU */ - void __cpuinit smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = &cpu_data(id); *c = boot_cpu_data; c->cpu_index = id; - if (id != 0) - identify_secondary_cpu(c); + /* + * During boot time, CPU0 has this setup already. Save the info when + * bringing up AP or offlined CPU0. + */ + identify_secondary_cpu(c); } static bool __cpuinit @@ -795,7 +806,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle) pr_debug("++++++++++++++++++++=_---CPU UP %u\n", cpu); - if (apicid == BAD_APICID || apicid == boot_cpu_physical_apicid || + if (apicid == BAD_APICID || !physid_isset(apicid, phys_cpu_present_map) || !apic->apic_id_valid(apicid)) { pr_err("%s: bad cpu %d\n", __func__, cpu); @@ -990,7 +1001,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) /* * Setup boot CPU information */ - smp_store_cpu_info(0); /* Final full version of the data */ + smp_store_boot_cpu_info(); /* Final full version of the data */ cpumask_copy(cpu_callin_mask, cpumask_of(0)); mb(); @@ -1214,19 +1225,6 @@ void cpu_disable_common(void) int native_cpu_disable(void) { - int cpu = smp_processor_id(); - - /* - * Perhaps use cpufreq to drop frequency, but that could go - * into generic code. - * - * We won't take down the boot processor on i386 due to some - * interrupts only being able to be serviced by the BSP. - * Especially so if we're not using an IOAPIC -zwane - */ - if (cpu == 0) - return -EBUSY; - clear_local_APIC(); cpu_disable_common(); From 209efae12981f3d2d694499b761def10895c078c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:42 -0800 Subject: [PATCH 05/14] x86, hotplug, suspend: Online CPU0 for suspend or hibernate Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't be executed if CPU0 is detected offline. To make suspend or hibernate and further resume succeed, CPU0 must be online. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/power/cpu.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 218cdb16163..adde77588e2 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -237,3 +237,47 @@ void restore_processor_state(void) #ifdef CONFIG_X86_32 EXPORT_SYMBOL(restore_processor_state); #endif + +/* + * When bsp_check() is called in hibernate and suspend, cpu hotplug + * is disabled already. So it's unnessary to handle race condition between + * cpumask query and cpu hotplug. + */ +static int bsp_check(void) +{ + if (cpumask_first(cpu_online_mask) != 0) { + pr_warn("CPU0 is offline.\n"); + return -ENODEV; + } + + return 0; +} + +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action, + void *ptr) +{ + int ret = 0; + + switch (action) { + case PM_SUSPEND_PREPARE: + case PM_HIBERNATION_PREPARE: + ret = bsp_check(); + break; + default: + break; + } + return notifier_from_errno(ret); +} + +static int __init bsp_pm_check_init(void) +{ + /* + * Set this bsp_pm_callback as lower priority than + * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called + * earlier to disable cpu hotplug before bsp online check. + */ + pm_notifier(bsp_pm_callback, -INT_MAX); + return 0; +} + +core_initcall(bsp_pm_check_init); From 6e32d479db6079dd5d4309aa66aecbcf2664a5fe Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:43 -0800 Subject: [PATCH 06/14] kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback cpu_hotplug_pm_callback should have higher priority than bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug to avoid race during bsp online checking. This is to hightlight the priorities between the two callbacks in case people may overlook the order. Ideally the priorities should be defined in macro/enum instead of fixed values. To do that, a seperate patchset may be pushed which will touch serveral other generic files and is out of scope of this patchset. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com Reviewed-by: Srivatsa S. Bhat Acked-by: Rafael J. Wysocki Signed-off-by: H. Peter Anvin --- kernel/cpu.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index 42bd331ee0a..a2491a2d62b 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -601,6 +601,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb, static int __init cpu_hotplug_pm_sync_init(void) { + /* + * cpu_hotplug_pm_callback has higher priority than x86 + * bsp_pm_callback which depends on cpu_hotplug_pm_callback + * to disable cpu hotplug to avoid cpu hotplug race. + */ pm_notifier(cpu_hotplug_pm_callback, 0); return 0; } From 42e78e9719aa0c76711e2731b19c90fe5ae05278 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:44 -0800 Subject: [PATCH 07/14] x86-64, hotplug: Add start_cpu0() entry point to head_64.S start_cpu0() is defined in head_64.S for 64-bit. The function sets up stack and jumps to start_secondary() for CPU0 wake up. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/head_64.S | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 94bf9cc2c7e..980053c4b9c 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -252,6 +252,22 @@ ENTRY(secondary_startup_64) pushq %rax # target address in negative space lretq +#ifdef CONFIG_HOTPLUG_CPU +/* + * Boot CPU0 entry point. It's called from play_dead(). Everything has been set + * up already except stack. We just set up stack here. Then call + * start_secondary(). + */ +ENTRY(start_cpu0) + movq stack_start(%rip),%rsp + movq initial_code(%rip),%rax + pushq $0 # fake return address to stop unwinder + pushq $__KERNEL_CS # set correct cs + pushq %rax # target address in negative space + lretq +ENDPROC(start_cpu0) +#endif + /* SMP bootup changes these two */ __REFDATA .align 8 From 3e2a0cc3cdc19e0518ae87583add40ea1bf55b67 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:45 -0800 Subject: [PATCH 08/14] x86-32, hotplug: Add start_cpu0() entry point to head_32.S start_cpu0() is defined in head_32.S for 32-bit. The function sets up stack and jumps to start_secondary() for CPU0 wake up. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-9-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/head_32.S | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 957a47aec64..a013e7390ab 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -266,6 +266,19 @@ num_subarch_entries = (. - subarch_entries) / 4 jmp default_entry #endif /* CONFIG_PARAVIRT */ +#ifdef CONFIG_HOTPLUG_CPU +/* + * Boot CPU0 entry point. It's called from play_dead(). Everything has been set + * up already except stack. We just set up stack here. Then call + * start_secondary(). + */ +ENTRY(start_cpu0) + movl stack_start, %ecx + movl %ecx, %esp + jmp *(initial_code) +ENDPROC(start_cpu0) +#endif + /* * Non-boot CPU entry point; entered from trampoline.S * We can't lgdt here, because lgdt itself uses a data segment, but From e1c467e69040c3be68959332959c07fb3d818e87 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Wed, 14 Nov 2012 04:36:53 -0800 Subject: [PATCH 09/14] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap code which is not a desired behavior for waking up BSP. To avoid the boot-strap code, wake up CPU0 by NMI instead. This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e. physically hot removed and then hot added), NMI won't wake it up. We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available. AP is still waken up as before by INIT, SIPI, SIPI sequence. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/cpu.h | 1 + arch/x86/kernel/smpboot.c | 111 ++++++++++++++++++++++++++++++++++--- 2 files changed, 105 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 4564c8e28a3..a1195726e8c 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -28,6 +28,7 @@ struct x86_cpu { #ifdef CONFIG_HOTPLUG_CPU extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); +extern void __cpuinit start_cpu0(void); #endif DECLARE_PER_CPU(int, cpu_state); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c297907f3c7..ef53e667e05 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -138,15 +138,17 @@ static void __cpuinit smp_callin(void) * we may get here before an INIT-deassert IPI reaches * our local APIC. We have to wait for the IPI or we'll * lock up on an APIC access. + * + * Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI. */ - if (apic->wait_for_init_deassert) + cpuid = smp_processor_id(); + if (apic->wait_for_init_deassert && cpuid != 0) apic->wait_for_init_deassert(&init_deasserted); /* * (This works even if the APIC is not enabled.) */ phys_id = read_apic_id(); - cpuid = smp_processor_id(); if (cpumask_test_cpu(cpuid, cpu_callin_mask)) { panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__, phys_id, cpuid); @@ -228,6 +230,8 @@ static void __cpuinit smp_callin(void) cpumask_set_cpu(cpuid, cpu_callin_mask); } +static int cpu0_logical_apicid; +static int enable_start_cpu0; /* * Activate a secondary processor. */ @@ -243,6 +247,8 @@ notrace static void __cpuinit start_secondary(void *unused) preempt_disable(); smp_callin(); + enable_start_cpu0 = 0; + #ifdef CONFIG_X86_32 /* switch away from the initial page table */ load_cr3(swapper_pg_dir); @@ -492,7 +498,7 @@ void __inquire_remote_apic(int apicid) * won't ... remember to clear down the APIC, etc later. */ int __cpuinit -wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) +wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip) { unsigned long send_status, accept_status = 0; int maxlvt; @@ -500,7 +506,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) /* Target chip */ /* Boot on the stack */ /* Kick the second */ - apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid); + apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid); pr_debug("Waiting for send to finish...\n"); send_status = safe_apic_wait_icr_idle(); @@ -660,6 +666,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid) node, cpu, apicid); } +static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs) +{ + int cpu; + + cpu = smp_processor_id(); + if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0) + return NMI_HANDLED; + + return NMI_DONE; +} + +/* + * Wake up AP by INIT, INIT, STARTUP sequence. + * + * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS + * boot-strap code which is not a desired behavior for waking up BSP. To + * void the boot-strap code, wake up CPU0 by NMI instead. + * + * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined + * (i.e. physically hot removed and then hot added), NMI won't wake it up. + * We'll change this code in the future to wake up hard offlined CPU0 if + * real platform and request are available. + */ +static int __cpuinit +wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid, + int *cpu0_nmi_registered) +{ + int id; + int boot_error; + + /* + * Wake up AP by INIT, INIT, STARTUP sequence. + */ + if (cpu) + return wakeup_secondary_cpu_via_init(apicid, start_ip); + + /* + * Wake up BSP by nmi. + * + * Register a NMI handler to help wake up CPU0. + */ + boot_error = register_nmi_handler(NMI_LOCAL, + wakeup_cpu0_nmi, 0, "wake_cpu0"); + + if (!boot_error) { + enable_start_cpu0 = 1; + *cpu0_nmi_registered = 1; + if (apic->dest_logical == APIC_DEST_LOGICAL) + id = cpu0_logical_apicid; + else + id = apicid; + boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip); + } + + return boot_error; +} + /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -675,6 +738,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) unsigned long boot_error = 0; int timeout; + int cpu0_nmi_registered = 0; /* Just in case we booted with a single CPU. */ alternatives_enable_smp(); @@ -722,13 +786,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) } /* - * Kick the secondary CPU. Use the method in the APIC driver - * if it's defined - or use an INIT boot APIC message otherwise: + * Wake up a CPU in difference cases: + * - Use the method in the APIC driver if it's defined + * Otherwise, + * - Use an INIT boot APIC message for APs or NMI for BSP. */ if (apic->wakeup_secondary_cpu) boot_error = apic->wakeup_secondary_cpu(apicid, start_ip); else - boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip); + boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, + &cpu0_nmi_registered); if (!boot_error) { /* @@ -793,6 +860,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) */ smpboot_restore_warm_reset_vector(); } + /* + * Clean up the nmi handler. Do this after the callin and callout sync + * to avoid impact of possible long unregister time. + */ + if (cpu0_nmi_registered) + unregister_nmi_handler(NMI_LOCAL, "wake_cpu0"); + return boot_error; } @@ -1037,6 +1111,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) */ setup_local_APIC(); + if (x2apic_mode) + cpu0_logical_apicid = apic_read(APIC_LDR); + else + cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR)); + /* * Enable IO APIC before setting up error vector */ @@ -1264,6 +1343,14 @@ void play_dead_common(void) local_irq_disable(); } +static bool wakeup_cpu0(void) +{ + if (smp_processor_id() == 0 && enable_start_cpu0) + return true; + + return false; +} + /* * We need to flush the caches before going to sleep, lest we have * dirty data in our caches when we come back up. @@ -1327,6 +1414,11 @@ static inline void mwait_play_dead(void) __monitor(mwait_ptr, 0, 0); mb(); __mwait(eax, 0); + /* + * If NMI wants to wake up CPU0, start CPU0. + */ + if (wakeup_cpu0()) + start_cpu0(); } } @@ -1337,6 +1429,11 @@ static inline void hlt_play_dead(void) while (1) { native_halt(); + /* + * If NMI wants to wake up CPU0, start CPU0. + */ + if (wakeup_cpu0()) + start_cpu0(); } } From 27fd185f3d9e83c64b7a596881d7751a638c9683 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:47 -0800 Subject: [PATCH 10/14] x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. Previously these functions were not run on the BSP (CPU 0, the boot processor) since the boot processor init would only be executed before this functionality was initialized. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-11-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/common.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 7505f7b13e7..ca165ac6793 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1237,7 +1237,7 @@ void __cpuinit cpu_init(void) oist = &per_cpu(orig_ist, cpu); #ifdef CONFIG_NUMA - if (cpu != 0 && this_cpu_read(numa_node) == 0 && + if (this_cpu_read(numa_node) == 0 && early_cpu_to_node(cpu) != NUMA_NO_NODE) set_numa_node(early_cpu_to_node(cpu)); #endif @@ -1269,8 +1269,7 @@ void __cpuinit cpu_init(void) barrier(); x86_configure_nx(); - if (cpu != 0) - enable_x2apic(); + enable_x2apic(); /* * set up and load the per-CPU TSS From 30242aa6023b71325c6b8addac06faf544a85fd0 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:48 -0800 Subject: [PATCH 11/14] x86, hotplug: The first online processor saves the MTRR state Ask the first online CPU to save mtrr instead of asking BSP. BSP could be offline when mtrr_save_state() is called. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-12-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/mtrr/main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 6b96110bb0c..e4c1a418453 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -695,11 +695,16 @@ void mtrr_ap_init(void) } /** - * Save current fixed-range MTRR state of the BSP + * Save current fixed-range MTRR state of the first cpu in cpu_online_mask. */ void mtrr_save_state(void) { - smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1); + int first_cpu; + + get_online_cpus(); + first_cpu = cpumask_first(cpu_online_mask); + smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); + put_online_cpus(); } void set_mtrr_aps_delayed_init(void) From 8d966a04107e56993a051cd41ead0b4f23ba2414 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:49 -0800 Subject: [PATCH 12/14] x86, hotplug: Handle retrigger irq by the first available CPU The first cpu in irq cfg->domain is likely to be CPU 0 and may not be available when CPU 0 is offline. Instead of using CPU 0 to handle retriggered irq, we use first available CPU which is online and in this irq's domain. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-13-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/apic/io_apic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 1817fa91102..f78fc2b4deb 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2199,9 +2199,11 @@ static int ioapic_retrigger_irq(struct irq_data *data) { struct irq_cfg *cfg = data->chip_data; unsigned long flags; + int cpu; raw_spin_lock_irqsave(&vector_lock, flags); - apic->send_IPI_mask(cpumask_of(cpumask_first(cfg->domain)), cfg->vector); + cpu = cpumask_first_and(cfg->domain, cpu_online_mask); + apic->send_IPI_mask(cpumask_of(cpu), cfg->vector); raw_spin_unlock_irqrestore(&vector_lock, flags); return 1; From 6f5298c2139b06925037490367906f3d73955b86 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:50 -0800 Subject: [PATCH 13/14] x86/i387.c: Initialize thread xstate only on CPU0 only once init_thread_xstate() is only called once to avoid overriding xstate_size during boot time or during CPU hotplug. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-14-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/i387.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 675a0501244..245a71db401 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -175,7 +175,11 @@ void __cpuinit fpu_init(void) cr0 |= X86_CR0_EM; write_cr0(cr0); - if (!smp_processor_id()) + /* + * init_thread_xstate is only called once to avoid overriding + * xstate_size during boot time or during CPU hotplug. + */ + if (xstate_size == 0) init_thread_xstate(); mxcsr_feature_mask_init(); From a71c8bc5dfefbbf80ef90739791554ef7ea4401b Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Tue, 13 Nov 2012 11:32:51 -0800 Subject: [PATCH 14/14] x86, topology: Debug CPU0 hotplug CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined. User can online CPU0 back after boot time. The default value of the switch is off. To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving cpu0_hotplug kernel parameter at boot. It's safe and early place to take down CPU0 after all hotplug notifiers are installed and SMP is booted. Please note that some applications or drivers, e.g. some versions of udevd, during boot time may put CPU0 online again in this CPU0 hotplug debug mode. In this debug mode, setup_local_APIC() may report a warning on max_loops<=0 when CPU0 is onlined back after boot time. This is because pending interrupt in IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on other CPUs as well. It is harmless except the first CPU0 online takes a bit longer time. And so this debug mode is useful to expose this issue. I'll send a seperate patch to fix this generic warning issue. Signed-off-by: Fenghua Yu Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/Kconfig | 15 +++++++++++ arch/x86/include/asm/cpu.h | 3 +++ arch/x86/kernel/topology.c | 51 ++++++++++++++++++++++++++++++++++++++ arch/x86/power/cpu.c | 38 ++++++++++++++++++++++++++++ 4 files changed, 107 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 036e89ab470..b6cfa5f6252 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1727,6 +1727,21 @@ config BOOTPARAM_HOTPLUG_CPU0 You still can enable the CPU0 hotplug feature at boot by kernel parameter cpu0_hotplug. +config DEBUG_HOTPLUG_CPU0 + def_bool n + prompt "Debug CPU0 hotplug" + depends on HOTPLUG_CPU && EXPERIMENTAL + ---help--- + Enabling this option offlines CPU0 (if CPU0 can be offlined) as + soon as possible and boots up userspace with CPU0 offlined. User + can online CPU0 back after boot time. + + To debug CPU0 hotplug, you need to enable CPU0 offline/online + feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during + compilation or giving cpu0_hotplug kernel parameter at boot. + + If unsure, say N. + config COMPAT_VDSO def_bool y prompt "Compat VDSO support" diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index a1195726e8c..5f9a1243190 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -29,6 +29,9 @@ struct x86_cpu { extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); extern void __cpuinit start_cpu0(void); +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +extern int _debug_hotplug_cpu(int cpu, int action); +#endif #endif DECLARE_PER_CPU(int, cpu_state); diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c index 0e7b4a7a7fb..6e60b5fe224 100644 --- a/arch/x86/kernel/topology.c +++ b/arch/x86/kernel/topology.c @@ -50,6 +50,57 @@ static int __init enable_cpu0_hotplug(char *str) __setup("cpu0_hotplug", enable_cpu0_hotplug); #endif +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +/* + * This function offlines a CPU as early as possible and allows userspace to + * boot up without the CPU. The CPU can be onlined back by user after boot. + * + * This is only called for debugging CPU offline/online feature. + */ +int __ref _debug_hotplug_cpu(int cpu, int action) +{ + struct device *dev = get_cpu_device(cpu); + int ret; + + if (!cpu_is_hotpluggable(cpu)) + return -EINVAL; + + cpu_hotplug_driver_lock(); + + switch (action) { + case 0: + ret = cpu_down(cpu); + if (!ret) { + pr_info("CPU %u is now offline\n", cpu); + kobject_uevent(&dev->kobj, KOBJ_OFFLINE); + } else + pr_debug("Can't offline CPU%d.\n", cpu); + break; + case 1: + ret = cpu_up(cpu); + if (!ret) + kobject_uevent(&dev->kobj, KOBJ_ONLINE); + else + pr_debug("Can't online CPU%d.\n", cpu); + break; + default: + ret = -EINVAL; + } + + cpu_hotplug_driver_unlock(); + + return ret; +} + +static int __init debug_hotplug_cpu(void) +{ + _debug_hotplug_cpu(0, 0); + return 0; +} + +late_initcall_sync(debug_hotplug_cpu); +#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */ + int __ref arch_register_cpu(int num) { struct cpuinfo_x86 *c = &cpu_data(num); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index adde77588e2..120cee1c3f8 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -21,6 +21,7 @@ #include #include #include /* pcntxt_mask */ +#include #ifdef CONFIG_X86_32 static struct saved_context saved_context; @@ -263,6 +264,43 @@ static int bsp_pm_callback(struct notifier_block *nb, unsigned long action, case PM_HIBERNATION_PREPARE: ret = bsp_check(); break; +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 + case PM_RESTORE_PREPARE: + /* + * When system resumes from hibernation, online CPU0 because + * 1. it's required for resume and + * 2. the CPU was online before hibernation + */ + if (!cpu_online(0)) + _debug_hotplug_cpu(0, 1); + break; + case PM_POST_RESTORE: + /* + * When a resume really happens, this code won't be called. + * + * This code is called only when user space hibernation software + * prepares for snapshot device during boot time. So we just + * call _debug_hotplug_cpu() to restore to CPU0's state prior to + * preparing the snapshot device. + * + * This works for normal boot case in our CPU0 hotplug debug + * mode, i.e. CPU0 is offline and user mode hibernation + * software initializes during boot time. + * + * If CPU0 is online and user application accesses snapshot + * device after boot time, this will offline CPU0 and user may + * see different CPU0 state before and after accessing + * the snapshot device. But hopefully this is not a case when + * user debugging CPU0 hotplug. Even if users hit this case, + * they can easily online CPU0 back. + * + * To simplify this debug code, we only consider normal boot + * case. Otherwise we need to remember CPU0's state and restore + * to that state and resolve racy conditions etc. + */ + _debug_hotplug_cpu(0, 0); + break; +#endif default: break; }