Short: Patch CopyMem/Quick for 68060(040) v1.4 Author: dbusse@primus-online.de (Dirk Busse) Uploader: dbusse primus-online de (Dirk Busse) Type: util/boot Requires: 68060 or 68040 Architecture: m68k-amigaos Description: This is a small patch which replace the CopyMem and CopyMemQuick functions of exec.library. These functions are optimized for the 68060 processor. They should also work with the 68040 processor. The patch tests for a 68040 or 060 processor. If it can't find one, it doesn't install the patch and exits with a return code of 20 (=fail). It also fails, if it can't allocate the necessary memory. In some cases these new functions are four times faster than the original functions. Installation: Just copy CMQ060 into c: And insert CMQ060 in your s:Startup-Sequence Some notes about Move16: Move16 is a new assembler command of the 68040 and 060 processors. It moves 16 bytes at once. Therefor it uses burst accesses. Andreas Kleinert and Thomas Richter told me, there could be problems with the Move16 command on the Amiga. Especially in the Chipram. Caused by the DMA of the custom chips. I couldn't produce such an error, but maybe on other systems. So V1.4 of CMQ060 doesn't use Move16 from or into memory below $01000000 (Chipram, ZorroII-Fastram, I/O-Space, Kickstart,...). Move16 is only used, when the source and destination addresses are both higher than $00ffffff (32-bit-Fastram,...). (If you didn't get any errors with V1.3 and want to get the most speed improvement, you could use CMQ060Move16. This is identically with CMQ060 V1.3 and uses Move16 also in and from Chipram. But maybe you get problems.) (If you want to avoid all problems which Move16 could cause [the 68040 has some Move16 bugs], you should use Aminet:util/boot/CMQ030. This one never uses Move16 and is still faster than the other available patches.) The source code is also in the archive. Author: Dirk Busse Kropsburgstraße 8 D-67141 Neuhofen Germany <100.141999@germanynet.de> How often are these functions used? Some people told me, they couldn't notice a speed improvement. You couldn't get a speed improvement by a factor of two. But there is a little speed improvement, even if you couldn't notice it. To show you how often the patched functions are called, I've inserted two modified patches into Version 1.1b of this archive. CMQ060beep: Every time one of the patched functions CopyMem or CopyMemQuick is called, your AMIGA makes a DisplayBeep. After calling LoadWB your AMIGA beeps very often per second. If you boot your AMIGA without Startup-Sequence and install CMQ060beep, you could see, every AMIGA dos command like Dir, List, Avail, Resident... is using the patched functions. They all are using the CopyMem function. And this is the function with the most speed improvement. CMQ060beepCMQ: This will only make a DisplayBeep, if the patched CopyMemQuick function is called. So it shows you which programms are using the patched CopyMemQuick function. For Example: PageStream3.3 while moving a scrollbar or making a redraw or TeleInfo2 or ... . The two above patches aren't for real use. They are only to demonstrate how often the functions are used. Speed comparision: There are already some similar patches available on the Aminet: CopyMemQuicker V2.8 from 1994 -> Aminet:util/boot/COPMQR28.lha PCM V1.0 from 1996 -> Aminet:util/boot/PCM_1.0.lha Also MCP patches these functions. CopyMemQuicker is optimized for a 68000,010 and 020 processor. But on a 68060 (I think also on a 68040) you could get some more speed improvement. PCM is optimized for the 68040 and 060 processor. But some copy modes like Long to Even aren't optimized. And the copy mode Long+1 to Even+1 needs twice the time as the original exec function. PCM works only with a 68040 or 060, because it also uses the Move16 command (see the note above). In a lot of cases the patched functions from MCP are the slowest of all. Some copy modes are even slower than the original Kickstart 3.1 functions. Here are some test results. All results are measured on the same AMIGA 2000 with a DKB WildFire060-50MHz: "TestIt" from CopyMemQuicker V2.8 original CopyMemQuicker MCP PCM CMQ030 CMQ060 CMQ060 Kickstart3.1 V2.8 V1.32b12 V1.0 V1.1 V1.4 Move16 CopyMem routines V1.4 565×64kB L->L 1.85 1.85 1.85 1.35 1.79 1.31 1.31 147×64kB L->L+1 1.33 1.14 1.07 1.07 0.47 0.45 0.47 413×64kB L->E 2.21 2.21 2.21 2.23 1.31 1.31 1.31 147×64kB L->E+1 1.35 1.15 1.07 1.07 0.45 0.45 0.45 147×64kB L+1->L 1.35 1.15 0.51 0.47 0.47 0.45 0.47 382×64kB L+1->L+1 2.11 1.23 2.88 0.91 1.21 0.89 0.87 147×64kB L+1->E 1.33 1.15 0.81 0.79 0.47 0.45 0.47 501×64kB L+1->E+1 1.71 1.70 3.81 3.71 1.59 1.57 1.59 501×64kB E->L 1.71 1.71 1.75 1.59 1.59 1.59 1.59 147×64kB E->L+1 1.33 1.15 1.11 1.07 0.47 0.47 0.47 382×64kB E->E 2.11 1.23 2.13 0.91 1.21 0.87 0.89 147×64kB E->E+1 1.35 1.13 1.13 1.09 0.47 0.45 0.45 147×64kB E+1->L 1.33 1.15 0.51 0.45 0.45 0.47 0.47 413×64kB E+1->L+1 2.19 2.19 3.15 3.05 1.31 1.29 1.29 147×64kB E+1->E 1.33 1.15 0.81 0.79 0.45 0.45 0.45 564×64kB E+1->E+1 1.81 1.81 4.31 1.35 1.79 1.31 1.31 33900×1kB L->L 1.10 1.11 1.13 1.31 1.03 1.04 1.07 9400×1kB L->L+1 1.17 0.93 0.91 0.86 0.29 0.29 0.27 24000×1kB E->E 1.70 0.80 1.68 0.92 0.74 0.75 0.75 196000×128B L->L 1.02 0.73 1.03 1.04 0.75 0.75 0.75 155000×128B E->E 1.61 0.63 1.55 1.05 0.62 0.60 0.59 588000×19B L->L 0.83 0.60 1.43 0.74 0.50 0.51 0.49 622000×18B L->L 0.81 0.51 1.43 0.77 0.51 0.49 0.51 663000×17B L->L 0.75 0.70 1.47 0.73 0.52 0.52 0.50 956000×16B L->L 0.79 0.71 1.98 1.00 0.58 0.51 0.50 1060000×8B L->L 0.85 0.79 1.17 1.01 0.58 0.52 0.52 1430000×4B L->L 0.73 0.61 1.09 1.14 0.45 0.39 0.41 2190000×1B L->L 0.67 0.61 0.73 0.84 0.33 0.57 0.62 CopyMemQuick 565×64kB L->L 1.85 1.87 1.85 1.33 1.79 1.31 1.29 33900×1kB L->L 1.09 1.11 1.13 0.89 1.03 1.03 1.07 196000×128B L->L 0.99 0.71 1.03 0.81 0.73 0.73 0.73 956000×16B L->L 0.69 0.63 0.88 0.94 0.38 0.39 0.38 1060000×8B L->L 0.47 0.57 0.71 0.60 0.40 0.40 0.40 1430000×4B L->L 0.35 0.51 0.73 0.52 0.23 0.21 0.25 "Test" from PCM V1.0 ("Test" moves ten times a Block of 500.000 Bytes) Fast->Fast CopyMem 0.26 0.26 0.18 0.18 0.24 0.18 0.18 CopyMemQuick 0.26 0.26 0.18 0.20 0.26 0.18 0.18 Chip->Fast CopyMem 1.98 1.98 1.96 2.16 2.16 2.15 1.98 CopyMemQuick 1.98 1.98 1.98 2.16 2.16 2.16 1.98 Fast->Chip CopyMem 1.92 1.91 1.92 1.90 1.90 1.90 1.90 CopyMemQuick 1.92 1.92 1.92 1.90 1.90 1.88 1.90 Chip->Chip CopyMem 3.64 3.62 3.64 3.70 3.96 3.96 3.72 CopyMemQuick 3.62 3.62 3.62 3.70 3.94 3.94 3.72 History: 1.0 (12.Sep.1998) - First public version. 1.1 (15.Sep.1998) - V1.0 exits with a return code of 10 (=error), if it can't find a 68040 or 68060 or can't get the necessary memory. V1.1 exits, in this cases, with a return code of 20 (=fail). - Fixed a mistake in the readme. 1.1b (19.Sep.1998) (I didn't changed the Patch itself! It's the same as V1.1) - Added the Testresults of MCP V1.30 into the readme. - Added CMQ060beep and CMQ060beepCMQ (see above). 1.2 (29.Nov.1998) - Added the Testresults of MCP V1.32b12 into the readme. - Changed the source code. There was a problem with a wrong written program which expects the address of the last source byte +1 in A0 and the address of the last destination byte +1 in A1. This version of CMQ060 solves problems with such badly programs. It's now 100 Bytes longer, but the speed is the same. Big moves by the CopyMem function will be one or two cycles faster, but you didn't recognize it. 1.3 (5.Jan.1999) All changes made to this version doesn't effect the speed. They are only to avoid problems with future versions of AMIGA OS. - changed the version string to the "standard" format - changed BMI to BCS and BPL to BCC -> now CMQ030 could move blocks bigger than 2 GigaByte ;-) 1.4 (3.Apr.1999) - CMQ060 now doesn't use Move16 into/from memory below $01000000 - added CMQ060Move16 (It's the same as CMQ060 V1.3) - added the test results of CMQ030 (Does never use Move16)