rpms/kernel/devel linux-2.6-x86-sleazy-fpu.patch, NONE, 1.1 kernel-2.6.spec, 1.2375, 1.2376

fedora-cvs-commits at redhat.com fedora-cvs-commits at redhat.com
Wed Jul 12 05:13:12 UTC 2006


Author: davej

Update of /cvs/dist/rpms/kernel/devel
In directory cvs.devel.redhat.com:/tmp/cvs-serv3846

Modified Files:
	kernel-2.6.spec 
Added Files:
	linux-2.6-x86-sleazy-fpu.patch 
Log Message:
Add Arjan's sleazy-fpu hack

linux-2.6-x86-sleazy-fpu.patch:
 arch/i386/kernel/process.c   |   12 ++++++++++++
 arch/i386/kernel/traps.c     |    3 ++-
 arch/x86_64/kernel/process.c |   10 ++++++++++
 arch/x86_64/kernel/traps.c   |    1 +
 include/asm-i386/i387.h      |    5 ++++-
 include/asm-x86_64/i387.h    |    5 ++++-
 include/linux/sched.h        |    9 +++++++++
 7 files changed, 42 insertions(+), 3 deletions(-)

--- NEW FILE linux-2.6-x86-sleazy-fpu.patch ---
From: Chuck Ebbert <76306.1226 at compuserve.com>

i386 port of the sLeAZY-fpu feature.  Chuck reports that this gives him a +/-
0.4% improvement on his simple benchmark

x86_64 description follows:

Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily.  This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive save/restore
all the time).  However for very frequent FPU users...  you take an extra trap
every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.  If the app indeed uses the FPU, the trap
is avoided.  (the chance of the 6th time slice using FPU after the previous 5
having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until there
are 5 consecutive ones again).  The reason for this is to give apps that do
longer bursts of FPU use still the lazy behavior back after some time.

Signed-off-by: Chuck Ebbert <76306.1226 at compuserve.com>
Signed-off-by: Arjan van de Ven <arjan at linux.intel.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
---

 arch/i386/kernel/process.c |   12 ++++++++++++
 arch/i386/kernel/traps.c   |    3 ++-
 include/asm-i386/i387.h    |    5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff -puN arch/i386/kernel/process.c~sleazy-fpu-feature-i386-support arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c~sleazy-fpu-feature-i386-support
+++ a/arch/i386/kernel/process.c
@@ -630,6 +630,11 @@ struct task_struct fastcall * __switch_t
 
 	__unlazy_fpu(prev_p);
 
+
+	/* we're going to use this soon, after a few expensive things */
+	if (next_p->fpu_counter > 5)
+		prefetch(&next->i387.fxsave);
+
 	/*
 	 * Reload esp0.
 	 */
@@ -688,6 +693,13 @@ struct task_struct fastcall * __switch_t
 
 	disable_tsc(prev_p, next_p);
 
+	/* If the task has used fpu the last 5 timeslices, just do a full
+	 * restore of the math state immediately to avoid the trap; the
+	 * chances of needing FPU soon are obviously high now
+	 */
+	if (next_p->fpu_counter > 5)
+		math_state_restore();
+
 	return prev_p;
 }
 
diff -puN arch/i386/kernel/traps.c~sleazy-fpu-feature-i386-support arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c~sleazy-fpu-feature-i386-support
+++ a/arch/i386/kernel/traps.c
@@ -1048,7 +1048,7 @@ fastcall unsigned char * fixup_x86_bogus
  * Must be called with kernel preemption disabled (in this case,
  * local interrupts are disabled at the call-site in entry.S).
  */
-asmlinkage void math_state_restore(struct pt_regs regs)
+asmlinkage void math_state_restore(void)
 {
 	struct thread_info *thread = current_thread_info();
 	struct task_struct *tsk = thread->task;
@@ -1058,6 +1058,7 @@ asmlinkage void math_state_restore(struc
 		init_fpu(tsk);
 	restore_fpu(tsk);
 	thread->status |= TS_USEDFPU;	/* So we fnsave on switch_to() */
+	tsk->fpu_counter++;
 }
 
 #ifndef CONFIG_MATH_EMULATION
diff -puN include/asm-i386/i387.h~sleazy-fpu-feature-i386-support include/asm-i386/i387.h
--- a/include/asm-i386/i387.h~sleazy-fpu-feature-i386-support
+++ a/include/asm-i386/i387.h
@@ -76,7 +76,9 @@ static inline void __save_init_fpu( stru
 
 #define __unlazy_fpu( tsk ) do { \
 	if (task_thread_info(tsk)->status & TS_USEDFPU) \
-		save_init_fpu( tsk ); \
+		save_init_fpu( tsk ); 			\
+	else						\
+		tsk->fpu_counter = 0;			\
 } while (0)
 
 #define __clear_fpu( tsk )					\
@@ -118,6 +120,7 @@ static inline void save_init_fpu( struct
 extern unsigned short get_fpu_cwd( struct task_struct *tsk );
 extern unsigned short get_fpu_swd( struct task_struct *tsk );
 extern unsigned short get_fpu_mxcsr( struct task_struct *tsk );
+extern asmlinkage void math_state_restore(void);
 
 /*
  * Signal frame handlers...
_
From: Arjan van de Ven <arjan at linux.intel.com>

Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily.  This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time).  However for very frequent FPU users...  you
take an extra trap every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.  If the app indeed uses the FPU, the
trap is avoided.  (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again).  The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.

[akpm at osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan at linux.intel.com>
Cc: Andi Kleen <ak at muc.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
---

 arch/x86_64/kernel/process.c |   10 ++++++++++
 arch/x86_64/kernel/traps.c   |    1 +
 include/asm-x86_64/i387.h    |    5 ++++-
 include/linux/sched.h        |    9 +++++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff -puN arch/x86_64/kernel/process.c~sleazy-fpu-feature-x86_64-support arch/x86_64/kernel/process.c
--- a/arch/x86_64/kernel/process.c~sleazy-fpu-feature-x86_64-support
+++ a/arch/x86_64/kernel/process.c
@@ -515,6 +515,10 @@ __switch_to(struct task_struct *prev_p, 
 	int cpu = smp_processor_id();  
 	struct tss_struct *tss = &per_cpu(init_tss, cpu);
 
+	/* we're going to use this soon, after a few expensive things */
+	if (next_p->fpu_counter>5)
+		prefetch(&next->i387.fxsave);
+
 	/*
 	 * Reload esp0, LDT and the page table pointer:
 	 */
@@ -618,6 +622,12 @@ __switch_to(struct task_struct *prev_p, 
 		}
 	}
 
+	/* If the task has used fpu the last 5 timeslices, just do a full
+	 * restore of the math state immediately to avoid the trap; the
+	 * chances of needing FPU soon are obviously high now
+	 */
+	if (next_p->fpu_counter>5)
+		math_state_restore();
 	return prev_p;
 }
 
diff -puN arch/x86_64/kernel/traps.c~sleazy-fpu-feature-x86_64-support arch/x86_64/kernel/traps.c
--- a/arch/x86_64/kernel/traps.c~sleazy-fpu-feature-x86_64-support
+++ a/arch/x86_64/kernel/traps.c
@@ -1073,6 +1073,7 @@ asmlinkage void math_state_restore(void)
 		init_fpu(me);
 	restore_fpu_checking(&me->thread.i387.fxsave);
 	task_thread_info(me)->status |= TS_USEDFPU;
+	me->fpu_counter++;
 }
 
 void __init trap_init(void)
diff -puN include/asm-x86_64/i387.h~sleazy-fpu-feature-x86_64-support include/asm-x86_64/i387.h
--- a/include/asm-x86_64/i387.h~sleazy-fpu-feature-x86_64-support
+++ a/include/asm-x86_64/i387.h
@@ -24,6 +24,7 @@ extern unsigned int mxcsr_feature_mask;
 extern void mxcsr_feature_mask_init(void);
 extern void init_fpu(struct task_struct *child);
 extern int save_i387(struct _fpstate __user *buf);
+extern asmlinkage void math_state_restore(void);
 
 /*
  * FPU lazy state save handling...
@@ -31,7 +32,9 @@ extern int save_i387(struct _fpstate __u
 
 #define unlazy_fpu(tsk) do { \
 	if (task_thread_info(tsk)->status & TS_USEDFPU) \
-		save_init_fpu(tsk); \
+		save_init_fpu(tsk); 			\
+	else						\
+		tsk->fpu_counter = 0;			\
 } while (0)
 
 /* Ignore delayed exceptions from user space */
diff -puN include/linux/sched.h~sleazy-fpu-feature-x86_64-support include/linux/sched.h
--- a/include/linux/sched.h~sleazy-fpu-feature-x86_64-support
+++ a/include/linux/sched.h
@@ -817,6 +817,15 @@ struct task_struct {
 	struct key *thread_keyring;	/* keyring private to this thread */
 	unsigned char jit_keyring;	/* default keyring to attach requested keys to */
 #endif
+	/*
+	 * fpu_counter contains the number of consecutive context switches
+	 * that the FPU is used. If this is over a threshold, the lazy fpu
+	 * saving becomes unlazy to save the trap. This is an unsigned char
+	 * so that after 256 times the counter wraps and the behavior turns
+	 * lazy again; this to deal with bursty apps that only use FPU for
+	 * a short time
+	 */
+	unsigned char fpu_counter;
 	int oomkilladj; /* OOM kill score adjustment (bit shift). */
 	char comm[TASK_COMM_LEN]; /* executable name excluding path
 				     - access with [gs]et_task_comm (which lock
_


Index: kernel-2.6.spec
===================================================================
RCS file: /cvs/dist/rpms/kernel/devel/kernel-2.6.spec,v
retrieving revision 1.2375
retrieving revision 1.2376
diff -u -r1.2375 -r1.2376
--- kernel-2.6.spec	12 Jul 2006 03:41:09 -0000	1.2375
+++ kernel-2.6.spec	12 Jul 2006 05:13:04 -0000	1.2376
@@ -271,6 +271,7 @@
 Patch206: linux-2.6-x86-hp-reboot.patch
 Patch207: linux-2.6-x86_64-tif-restore-sigmask.patch
 Patch208: linux-2.6-x86_64-add-ppoll-pselect.patch
+Patch209: linux-2.6-x86-sleazy-fpu.patch
 
 # 300 - 399   ppc(64)
 Patch305: linux-2.6-cell-mambo-drivers.patch
@@ -754,6 +755,8 @@
 %patch207 -p1
 # Add ppoll and pselect syscalls
 %patch208 -p1
+# Arjan's sleazy fpu trick.
+%patch209 -p1
 
 #
 # PowerPC
@@ -1594,6 +1597,10 @@
 %endif
 
 %changelog
+* Wed Jul 12 2006 Dave Jones <davej at redhat.com>
+- Add sleazy fpu optimisation.   Apps that heavily
+  use floating point in theory should get faster.
+
 * Tue Jul 11 2006 Dave Jones <davej at redhat.com>
 - Add utrace. (ptrace replacement).
 




More information about the fedora-cvs-commits mailing list