The HyperNews Linux KHG Discussion Pages

How System Calls Work on Linux/i86

This section covers first the mechanisms provided by the 386 for handling system calls, and then shows how Linux uses those mechanisms. This is not a reference to the individual system calls: There are very many of them, new ones are added occasionally, and they are documented in man pages that should be on your Linux system.

What Does the 386 Provide?

The 386 recognizes two event classes: exceptions and interrupts. Both cause a forced context switch to new a procedure or task. Interrupts can occur at unexpected times during the execution of a program and are used to respond to signals from hardware. Exceptions are caused by the execution of instructions.

Two sources of interrupts are recognized by the 386: Maskable interrupts and Nonmaskable interrupts. Two sources of exceptions are recognized by the 386: Processor detected exceptions and programmed exceptions.

Each interrupt or exception has a number, which is referred to by the 386 literature as the vector. The NMI interrupt and the processor detected exceptions have been assigned vectors in the range 0 through 31, inclusive. The vectors for maskable interrupts are determined by the hardware. External interrupt controllers put the vector on the bus during the interrupt-acknowledge cycle. Any vector in the range 32 through 255, inclusive, can be used for maskable interrupts or programmed exceptions. Here is a listing of all the possible interrupts and exceptions:

0divide error
1debug exception
2NMI interrupt
3Breakpoint
4INTO-detected Overflow
5BOUND range exceeded
6Invalid opcode
7coprocessor not available
8double fault
9coprocessor segment overrun
10invalid task state segment
11segment not present
12stack fault
13general protection
14page fault
15reserved
16coprocessor error
17-31reserved
32-255maskable interrupts

The priority of simultaneous interrupts and exceptions is:

HIGHESTFaults except debug faults
.Trap instructions INTO, INT n, INT 3
.Debug traps for this instruction
.Debug traps for next instruction
.NMI interrupt
LOWESTINTR interrupt

How Linux Uses Interrupts and Exceptions

Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.

iBCS2 requries an lcall 0,7 instruction, which Linux can send to the iBCS2 compatibility module appropriate if an iBCS2-compliant binary is being executed. In fact, Linux will assume that an iBCS2-compliant binary is being executed if an lcall 0,7 call is executed, and will automatically switch modes.

As of version 0.99.2 of Linux, there are 116 system calls. Documentation for these can be found in the man (2) pages. When a user invokes a system call, execution flow is as follows:

How Linux Initializes the system call vectors

The startup_32() code found in /usr/src/linux/boot/head.S starts everything off by calling setup_idt(). This routine sets up an IDT (Interrupt Descriptor Table) with 256 entries. No interrupt entry points are actually loaded by this routine, as that is done only after paging has been enabled and the kernel has been moved to 0xC0000000. An IDT has 256 entries, each 4 bytes long, for a total of 1024 bytes. When start_kernel() (found in /usr/src/linux/init/main.c) is called it invokes trap_init() (found in /usr/src/linux/kernel/traps.c). trap_init() sets up the IDT via the macro set_trap_gate() (found in /usr/include/asm/system.h). trap_init() initializes the interrupt descriptor table as shown here:

0divide_error
1debug
2nmi
3int3
4overflow
5bounds
6invalid_op
7device_not_available
8double_fault
9coprocessor_segment_overrun
10invalid_TSS
11segment_not_present
12stack_segment
13general_protection
14page_fault
15reserved
16coprocessor_error
17alignment_check
18-48reserved
At this point the interrupt vector for the system calls is not set up. It is initialized by sched_init() (found in /usr/src/linux/kernel/sched.c). A call to set_system_gate (0x80, &system_call) sets interrupt 0x80 to be a vector to the system_call() entry point.

How to Add Your Own System Calls

  1. Create a directory under the /usr/src/linux/ directory to hold your code.
  2. Put any include files in /usr/include/sys/ and /usr/include/linux/.
  3. Add the relocatable module produced by the link of your new kernel code to the ARCHIVES and the subdirectory to the SUBDIRS lines of the top level Makefile. See fs/Makefile, target fs.o for an example.
  4. Add a #define __NR_xx to unistd.h to assign a call number for your system call, where xx, the index, is something descriptive relating to your system call. It will be used to set up the vector through sys_call_table to invoke you code.
  5. Add an entry point for your system call to the sys_call_table in sys.h. It should match the index (xx) that you assigned in the previous step. The NR_syscalls variable will be recalculated automatically.
  6. Modify any kernel code in kernel/fs/mm/, etc. to take into account the environment needed to support your new code.
  7. Run make from the top level to produce the new kernel incorporating your new code.
At this point, you will have to either add a syscall to your libraries, or use the proper _syscalln() macro in your user program for your programs to access the new system call. The 386DX Microprocessor Programmer's Reference Manual is a helpful reference, as is James Turley's Advanced 80386 Programming Techniques. See the Annotated Bibliography.

Copyright (C) 1993, 1996 Michael K. Johnson, [email protected].
Copyright (C) 1993 Stanley Scalsky


Messages

4. Note: wrong file for system_call code by Tim Bird
3. Note: would be nice to explain syscall macros by Tim Bird
2. Note: wrong file for syscallX() macro by Tim Bird
1. Feedback: the directory /usr/src/libc/syscall/ by vijay gupta
1. Note: ...no longer exists. by Michael K. Johnson
-> Feedback: the solution to the problem by Vijay Gupta