Mike's notes on using the QCDOC This a very brief outline of how to program the QCDOC. See the documentation listed at the end for more details. This setup is likely to change in later versions of qos. If so, adapt whatever example programs you see under $QOS/quser/ or whatever this evolves into. ------------------------------------------- Summary: write a c++ program slogin to a host machine, i.e. qcdochost0.phys.columbia.edu source setup.csh for appropriate os version, i.e. source /qcdoc/sfw/qos/qos-stable-2-2-2/scripts/setup.csh compile claim available machine from the web page, i.e. https://qcdoc.phys.columbia.edu set QMACHINE=machine name, i.e. ssbp2/01node/mach01 qsession $QMACHINE qinit $QMACHINE qpartition_connect -p 0 qreset_boot qdiscover qpartition_remap -T0 -X1 -Y2 -Z3 -S4 -W5 qrun your program qnodes_print -b stdout qdetach release machine on web page ---------------------------------------------- Start with a C++ program, i.e. hello.C : #include int main(){ printf("hello world\n"); return 0; } Note suffix .C, to indicate C++. It is easiest to use the language the OS uses naturally. C++ is mostly backward compatible with C. ----------------------------------------------- Compiling: source /qcdoc/sfw/qos/qos-stable-2-2-2/scripts/setup.csh (or appropriate os version) This sets up include and library paths and GCC_EXEC_PREFIX for the cross compiler. Include $QOS/quser/Makefile.usr.rules in your makefile: include $(QOS)/quser/Makefile.user.rules all: hello.x clean: - rm -f $(CLEANFILES) hello.x: $(OBJECTS) $(CXX) $(OBJECTS) -o hello.x Now type "make" and edit until it works, giving hello.x If you want to use other libraries you may need to specify them explicitly, i.e. for QMP you need -lqcd_api ---------------------------------------- Booting the machine: claim machine from web page set QMACHINE=machine name qsession $QMACHINE qinit $QMACHINE qpartition_connect -p 0 qreset_boot qdiscover qpartition_remap -T0 -X1 -Y2 -Z3 -S4 -W5 These should be run from a directory on which you have write permission. The final step above is to give the geometry remapping you want. As written above, it does nothing. On, say, a 1x1x1x2x2x2 machine qpartition_remap -T345 -X1 -Y2 -Z0 will give a configuration that mimics a 8x1x1x1x1x1 machine. However, on such don't try transfers in the S or W directions. ------------------------------------------ Running your program: qrun hello.x when done qdetach exit release machine from web page Please don't forget the last step! With the current stable OS version this leaves a disconnected qdaemon running. This should be killed by hand or you will have trouble when you go back to the same machine. ps -u kill -9 other useful commands: qnodes_print -b stdout qhelp qkill ----------------------------------------- Example including some simple operating sytem functions: #include #include int main(){ DefaultSetup(); /* necessary for os functions to work */ /* expected to go away in future versions of qos */ printf("hello world from processor %d\n",UniqueID()); printf("this machine has %d nodes\n",NumNodes()); printf("geometry = %d by %d by %d by %d by %d by %d\n", SizeT(),SizeX(),SizeY(),SizeZ(),SizeS(),SizeW()); printf("node coordinates = (%d, %d, %d, %d, %d, %d)\n", CoorT(),CoorX(),CoorY(),CoorZ(),CoorS(),CoorW()); return 0; } ----------------------------------------- Example showing communication between nodes: #include #include #include #define BUFFERSIZE 80 int main(){ SCUDirArgIR send, receive; char *mybuffer1,*mybuffer2; DefaultSetup(); /* allocate and fill communication buffers */ mybuffer1 = (char *) qalloc(QNONCACHE|QCOMMS|QFAST,BUFFERSIZE); mybuffer2 = (char *) qalloc(QNONCACHE|QCOMMS|QFAST,BUFFERSIZE); sprintf(mybuffer1,"hello world from processor %d",UniqueID()); sprintf(mybuffer2,"incoming message will go here"); printf("mybuffer1: %s\nmybuffer2: %s\n",mybuffer1,mybuffer2); /* transfer buffer1 to buffer2 down the W direction */ send.Init(mybuffer1,SCU_WM,SCU_SEND,BUFFERSIZE,1,8); receive.Init(mybuffer2,SCU_WP,SCU_REC,BUFFERSIZE,1,8); send.StartTrans(); receive.StartTrans(); send.TransComplete(); receive.TransComplete(); printf("mybuffer1: %s\nmybuffer2: %s\n",mybuffer1,mybuffer2); qfree(mybuffer1); qfree(mybuffer2); return 0; } ----------------------------------------- Simple ways to crash the machine (as of qos-stable-2-2-2): start an SCU receive without a corresponding send and then exit transfer in an unmapped direction transfer with block size 0 Call DefaultSetup() before entering main(), i.e. in the constructor for a global object transfer after qreset_boot but forgetting qdiscover ----------------------------------------- Recovering from a crash: qdiscover might do it; if not then qreset_boot qdiscover Either of these may need to be repeated twice, particularly the qdiscover step When things get really bad, it sometimes seems necessary to go to one of the newer unstable versions of the operating system and qreset from there. ----------------------------------------- Documentation: These example files and a few others are at http://thy.phy.bnl.gov/~creutz/qcdoc/ Chulwoo's "An introduction to QCDOC Computer," A copy is in the above directory. Balint's notes at http://www.ph.ed.ac.uk/~bj/QCDOC_quick_start.html various files in $QOS/quser/include/qcdocos/, i.e. $QOS/quser/include/qcdocos/syscalls.h $QOS/quser/include/qcdocos/scu_dir_arg.h $QOS/quser/include/qcdocos/scu_enum.h also look in $QOS/quser/usertest $QOS/quser/scu_simple Most content in the includes is adapted from the qcdsp. Several of the qcdsp calls are not yet implemented properly; test everything.