Skip to main content

Timing that code!

Submitted by WiffleCube on

Yet another 'Steptoe and Son' shoestring-budget software tip:

Remember programming the Z80/6502 where you could time code easily by changing the border colour when your code was executing?

Well, there is an instruction on pentium+ class processors that can be used to count how many cycles your code takes; RDTSC. It's pretty good
alternative to using an expensive profiler, although it doesn't take into account things like graphics cards that process concurrently.

Here is the general idea, you can find code on the net:

__int64 gstart, gend; // 64 bit integers i.e. LONG LONG
int gcycles;

void StartTimer()
{
_asm
{ do a cpuid to reset processor.
rdtsc
store result in gstart;
}
}

void StopTimer()
{
_asm
{ do a cpuid to reset processor.
rdtsc.
store result in gend.
}
gcycles=(int)(gend-gstart);
}

void main()
{
StartTimer();
....yourcode
StopTimer();
cout<<"Cycles taken"<

Submitted by WiffleCube on Thu, 26/08/04 - 11:14 AMPermalink

Forgot to mention.. you may get slightly different results each time you take a measurement... this is due to other processes being executed.

As an aside, the MVC++ inline keyword is not just syntactic sugar. During
disassembly I found that taking it out causes a function to be 'called' whereas leaving it in puts it straight in the execution flow.

Submitted by Daemin on Thu, 26/08/04 - 10:29 PMPermalink

For timing code on Windows computers just use the built in high resolution timer, it's that simple. And you're using __int64 there which is IIRC a Visual Studio keyword, not a GCC one.

It might be useful for Linux and other operating systems running on x86 hardware. However generally now I try to stay away from doing assembly stuff in my programming, it really isn't worth the effort for anything more than a consoles.

Submitted by Blitz on Thu, 26/08/04 - 11:24 PMPermalink

Yeah, you can just use QueryPerformanceCounter etc. on win32 systems and that way you don't even need to do any checks to see what the CPU is, the Query function will just fail if the CPU doesn't support it.

But anyway, the benefit of using expensive profilers is not the timing, but they usually give you nice information on a function by function basis for the entire program, without having to insert any additional code in your program. At least thats what i read/hear :P

"As an aside, the MVC++ inline keyword is not just syntactic sugar. During
disassembly I found that taking it out causes a function to be 'called' whereas leaving it in puts it straight in the execution flow."

Yup, thats what it's supposed to do :P
And "inline" is a C++ keyword, not MSVC++ specific...
CYer, Blitz

Submitted by WiffleCube on Sat, 28/08/04 - 12:20 AMPermalink

It's good that you can do that wrapped up in a call. Didn't know it existed, or how it's implemented. If it clears the pipeline beforehand it'd be useful for measurement, but if it
leaves stuff there then the code your trying to tighten up might be measured inaccurately.
I wouldn't advocate writing everything in assembler, but timing allows different C++ approaches
to be compared for e.g. cache coherency etc.

On the subject of the 'inline' keyword, of course it's a standard C++ command, as is 'reg', but depending on your compiler they may or may not do anything. In MVC++ with optimisation turned on forgetting to inline code sometimes still results in a call-which is quite surprising. Probably some flag settings I missed.

Submitted by Blitz on Sat, 28/08/04 - 4:05 AMPermalink

There are compiler options in vc++ to turn inlining on/off (should be under c/c++->optimisation->inline function expansion). The inline keyword itself is only a hint to the compiler anyway, the compiler can still choose to not inline code that you have specified as such.

"In MVC++ with optimisation turned on forgetting to inline code sometimes still results in a call-which is quite surprising. "

Are you talking about code that you have implicitly inlined by defining it inside the class declaration? If so, then it's probably a case of what i stated above, compiler decided the function wasn't worth inlining. If it's a function defined outside the class, then afaik VC++ will never inline that code unless you give it the inline hint...
You can use the MSVC++ specific "__force_inline" directive to give the compiler a very strong hint that you want certain code inline.
CYer, Blitz

Submitted by CombatWombat on Sat, 28/08/04 - 5:21 PMPermalink

There is one case I know of where inlining actually made the code go slower :) It was under the BeOS, running on a bebox with 2xPowerPC 603e processors - the L1 cache was something like 16K for instructions. It turned out that what was happening was that by inlining this particular function (which was (A) long and (B) had been used in many other parts of the code) it forced a loop to bloat to well over what would fit in cache, hence lots of cache misses :) (The program was calculating Pi or some equally nerdly pursuit...)

Ah now there was a cool case mod that was ahead of its time (the LEDs on either side of the case showed the processor load):

[url]http://www.litmusgreen.com/ed/bebox.jpeg[/url]

Submitted by WiffleCube on Sun, 29/08/04 - 2:26 AMPermalink

Blitz: That's useful to know, especially when your trying to keep the execution sequence branchless.

Combat Wombat: Looks good, got to admit I've never really got into hardware, mainly I think because of squeamish; all that exposed hardware reminds me of an operating theatre *shiver*

Submitted by dom on Sun, 10/10/04 - 3:33 AMPermalink

Regarding the RDTSC instruction, you need to remember that it might not work as intended in multi-processor environment. For that you'll need to keep track which CPU is executing the RDTSC instruction. Also, there is a chance that some OS' don't allow the execution or RDTSC, which causes the CPU to triggering an exception.

Yet another 'Steptoe and Son' shoestring-budget software tip:

Remember programming the Z80/6502 where you could time code easily by changing the border colour when your code was executing?

Well, there is an instruction on pentium+ class processors that can be used to count how many cycles your code takes; RDTSC. It's pretty good
alternative to using an expensive profiler, although it doesn't take into account things like graphics cards that process concurrently.

Here is the general idea, you can find code on the net:

__int64 gstart, gend; // 64 bit integers i.e. LONG LONG
int gcycles;

void StartTimer()
{
_asm
{ do a cpuid to reset processor.
rdtsc
store result in gstart;
}
}

void StopTimer()
{
_asm
{ do a cpuid to reset processor.
rdtsc.
store result in gend.
}
gcycles=(int)(gend-gstart);
}

void main()
{
StartTimer();
....yourcode
StopTimer();
cout<<"Cycles taken"<


Submitted by WiffleCube on Thu, 26/08/04 - 11:14 AMPermalink

Forgot to mention.. you may get slightly different results each time you take a measurement... this is due to other processes being executed.

As an aside, the MVC++ inline keyword is not just syntactic sugar. During
disassembly I found that taking it out causes a function to be 'called' whereas leaving it in puts it straight in the execution flow.

Submitted by Daemin on Thu, 26/08/04 - 10:29 PMPermalink

For timing code on Windows computers just use the built in high resolution timer, it's that simple. And you're using __int64 there which is IIRC a Visual Studio keyword, not a GCC one.

It might be useful for Linux and other operating systems running on x86 hardware. However generally now I try to stay away from doing assembly stuff in my programming, it really isn't worth the effort for anything more than a consoles.

Submitted by Blitz on Thu, 26/08/04 - 11:24 PMPermalink

Yeah, you can just use QueryPerformanceCounter etc. on win32 systems and that way you don't even need to do any checks to see what the CPU is, the Query function will just fail if the CPU doesn't support it.

But anyway, the benefit of using expensive profilers is not the timing, but they usually give you nice information on a function by function basis for the entire program, without having to insert any additional code in your program. At least thats what i read/hear :P

"As an aside, the MVC++ inline keyword is not just syntactic sugar. During
disassembly I found that taking it out causes a function to be 'called' whereas leaving it in puts it straight in the execution flow."

Yup, thats what it's supposed to do :P
And "inline" is a C++ keyword, not MSVC++ specific...
CYer, Blitz

Submitted by WiffleCube on Sat, 28/08/04 - 12:20 AMPermalink

It's good that you can do that wrapped up in a call. Didn't know it existed, or how it's implemented. If it clears the pipeline beforehand it'd be useful for measurement, but if it
leaves stuff there then the code your trying to tighten up might be measured inaccurately.
I wouldn't advocate writing everything in assembler, but timing allows different C++ approaches
to be compared for e.g. cache coherency etc.

On the subject of the 'inline' keyword, of course it's a standard C++ command, as is 'reg', but depending on your compiler they may or may not do anything. In MVC++ with optimisation turned on forgetting to inline code sometimes still results in a call-which is quite surprising. Probably some flag settings I missed.

Submitted by Blitz on Sat, 28/08/04 - 4:05 AMPermalink

There are compiler options in vc++ to turn inlining on/off (should be under c/c++->optimisation->inline function expansion). The inline keyword itself is only a hint to the compiler anyway, the compiler can still choose to not inline code that you have specified as such.

"In MVC++ with optimisation turned on forgetting to inline code sometimes still results in a call-which is quite surprising. "

Are you talking about code that you have implicitly inlined by defining it inside the class declaration? If so, then it's probably a case of what i stated above, compiler decided the function wasn't worth inlining. If it's a function defined outside the class, then afaik VC++ will never inline that code unless you give it the inline hint...
You can use the MSVC++ specific "__force_inline" directive to give the compiler a very strong hint that you want certain code inline.
CYer, Blitz

Submitted by CombatWombat on Sat, 28/08/04 - 5:21 PMPermalink

There is one case I know of where inlining actually made the code go slower :) It was under the BeOS, running on a bebox with 2xPowerPC 603e processors - the L1 cache was something like 16K for instructions. It turned out that what was happening was that by inlining this particular function (which was (A) long and (B) had been used in many other parts of the code) it forced a loop to bloat to well over what would fit in cache, hence lots of cache misses :) (The program was calculating Pi or some equally nerdly pursuit...)

Ah now there was a cool case mod that was ahead of its time (the LEDs on either side of the case showed the processor load):

[url]http://www.litmusgreen.com/ed/bebox.jpeg[/url]

Submitted by WiffleCube on Sun, 29/08/04 - 2:26 AMPermalink

Blitz: That's useful to know, especially when your trying to keep the execution sequence branchless.

Combat Wombat: Looks good, got to admit I've never really got into hardware, mainly I think because of squeamish; all that exposed hardware reminds me of an operating theatre *shiver*

Submitted by dom on Sun, 10/10/04 - 3:33 AMPermalink

Regarding the RDTSC instruction, you need to remember that it might not work as intended in multi-processor environment. For that you'll need to keep track which CPU is executing the RDTSC instruction. Also, there is a chance that some OS' don't allow the execution or RDTSC, which causes the CPU to triggering an exception.