The HyperNews Linux KHG Discussion Pages

Supporting Functions

Here is a list of many of the most common supporting functions available to the device driver writer. If you find other supporting functions that are useful, please point them out to me. I know this is not a complete list, but I hope it is a helpful one.

add_request()

static void add_request(struct blk_dev_struct *dev, struct request * req)

This is a static function in ll_rw_block.c, and cannot be called by other code. However, an understanding of this function, as well as an understanding of ll_rw_block(), may help you understand the strategy routine.

If the device that the request is for has an empty request queue, the request is put on the queue and the strategy routine is called. Otherwise, the proper place in the queue is chosen and the request is inserted in the queue, maintaining proper order by insertion sort.

Proper order (the elevator algorithm) is defined as:

  1. Reads come before writes.
  2. Lower minor numbers come before higher minor numbers.
  3. Lower block numbers come before higher block numbers.
The elevator algorithm is implemented by the macro IN_ORDER(), which is defined in drivers/block/blk.h [This may have changed somewhat recently, but it shouldn't matter to the driver writer anyway...]

Defined in: drivers/block/ll_rw_block.c
See also: make_request(), ll_rw_block().

add_timer()

void add_timer(struct timer_list * timer)
#include <linux/timer.h>

Installs the timer structures in the list timer in the timer list.

The timer_list structure is defined by:

struct timer_list {
        struct timer_list *next;
        struct timer_list *prev;
        unsigned long expires;
        unsigned long data;
        void (*function)(unsigned long);
};

In order to call add_timer(), you need to allocate a timer_list structure, and then call init_timer(), passing it a pointer to your timer_list. It will nullify the next and prev elements, which is the correct initialization. If necessary, you can allocate multiple timer_list structures, and link them into a list. Do make sure that you properly initialize all the unused pointers to NULL, or the timer code may get very confused.

For each struct in your list, you set three variables:

expires
The number of jiffies (100ths of a second in Linux/86; thousandths or so in Linux/Alpha) after which to time out.
function
Kernel-space function to run after timeout has occured.
data
Passed as the argument to function when function is called.

Having created this list, you give a pointer to the first (usually the only) element of the list as the argument to add_timer(). Having passed that pointer, keep a copy of the pointer handy, because you will need to use it to modify the elements of the list (to set a new timeout when you need a function called again, to change the function to be called, or to change the data that is passed to the function) and to delete the timer, if necessary.

Note: This is not process-specific. Therefore, if you want to wake a certain process at a timeout, you will have to use the sleep and wake primitives. The functions that you install through this mechanism will run in the same context that interrupt handlers run in.

Defined in: kernel/sched.c
See also: timer_table in include/linux/timer.h, init_timer(), del_timer().

cli()

#define cli() __asm__ __volatile__ ("cli"::)
#include <asm/system.h>

Prevents interrupts from being acknowledged. cli stands for ``CLear Interrupt enable''.

See also: sti()

del_timer

void del_timer(struct timer_list * timer)
#include <linux/timer.h>

Deletes the timer structures in the list timer in the timer list.

The timer list that you delete must be the address of a timer list you have earlier installed with add_timer(). Once you have called del_timer() to delete the timer from the kernel timer list, you may deallocate the memory used in the timer_list structures, as it is no longer referenced by the kernel timer list.

Defined in: kernel/sched.c
See also: timer_table in include/linux/timer.h, init_timer(), add_timer().

end_request()

static void end_request(int uptodate)
#include "blk.h"

Called when a request has been satisfied or aborted. Takes one argument:

uptodate
If not equal to 0, means that the request has been satisfied.
If equal to 0, means that the request has not been satisfied.

If the request was satisfied (uptodate != 0), end_request() maintains the request list, unlocks the buffer, and may arrange for the scheduler to be run at the next convenient time (need_resched = 1; this is implicit in wake_up(), and is not explicitly part of end_request()), before waking up all processes sleeping on the wait_for_request event, which is slept on in make_request(), ll_rw_page(), and ll_rw_swap_file().

Note: This function is a static function, defined in drivers/block/blk.h for every non-SCSI device that includes blk.h. (SCSI devices do this differently; the high-level SCSI code itself provides this functionality to the low-level device-specific SCSI device drivers.) It includes several defines dependent on static device information, such as the device number. This is marginally faster than a more generic normal C function.

Defined in: kernel/blk_drv/blk.h
See also: ll_rw_block(), add_request(), make_request().

free_irq()

void free_irq(unsigned int irq)
#include <linux/sched.h>

Frees an irq previously aquired with request_irq() or irqaction(). Takes one argument:

irq
interrupt level to free.

Defined in: kernel/irq.c
See also: request_irq(), irqaction().

get_user()

#define get_user(ptr) ((__typeof__(*(ptr)))__get_user((ptr),sizeof(*(ptr))))
#include <asm/segment.h>

Allows a driver to access data in user space, which is in a different segment than the kernel. Derives the type of the argument and the return type automatically. This means that you have to use types correctly. Shoddy typing will simply fail to work.

[Caution!] Note: these functions may cause implicit I/O, if the memory being accessed has been swapped out, and therefore pre-emption may occur at this point. Do not include these functions in critical sections of your code even if the critical sections are protected by cli()/sti() pairs, because that implicit I/O will violate the integrity of your cli()/sti() pair. If you need to get at user-space memory, copy it to kernel-space memory before you enter your critical section.

These functions take one argument:

addr
Address to get data from.
Returns:
Data at that offset in user space.

Defined in: include/asm/segment.h
See also: memcpy_*fs(), put_user(), cli(), sti().

inb(), inb_p()

inline unsigned int inb(unsigned short port)
inline unsigned int inb_p(unsigned short port)
#include <asm/io.h>

Reads a byte from a port. inb() goes as fast as it can, while inb_p() pauses before returning. Some devices are happier if you don't read from them as fast as possible. Both functions take one argument:

port
Port to read byte from.
Returns:
The byte is returned in the low byte of the 32-bit integer, and the 3 high bytes are unused, and may be garbage.

Defined in: include/asm/io.h
See also: outb(), outb_p().

init_timer()

Inline function for initializing timer_list structures for use with add_timer().

Defined in: include/linux/timer.h
See also: add_timer().

irqaction()

int irqaction(unsigned int irq, struct sigaction *new)
#include <linux/sched.h>

Hardware interrupts are really a lot like signals. Therefore, it makes sense to be able to register an interrupt like a signal. The sa_restorer() field of the struct sigaction is not used, but otherwise it is the same. The int argument to the sa.handler() function may mean different things, depending on whether or not the IRQ is installed with the SA_INTERRUPT flag. If it is not installed with the SA_INTERRUPT flag, then the argument passed to the handler is a pointer to a register structure, and if it is installed with the SA_INTERRUPT flag, then the argument passed is the number of the IRQ. For an example of handler set to use the SA_INTERRUPT flag, look at how rs_interrupt() is installed in drivers/char/serial.c

The SA_INTERRUPT flag is used to determine whether or not the interrupt should be a ``fast'' interrupt. Normally, upon return from the interrupt, need_resched, a global flag, is checked. If it is set (!= 0), then schedule() is run, which may schedule another process to run. They are also run with all other interrupts still enabled. However, by setting the sigaction structure member sa_flags to SA_INTERRUPT, ``fast'' interrupts are chosen, which leave out some processing, and very specifically do not call schedule().

irqaction() takes two arguments:

irq
The number of the IRQ the driver wishes to acquire.
new
A pointer to a sigaction struct.
Returns:
-EBUSY if the interrupt has already been acquired,
-EINVAL if sa.handler() is NULL,
0 on success.

Defined in: kernel/irq.c
See also: request_irq(), free_irq()

IS_*(inode)

IS_RDONLY(inode) ((inode)->i_flags & MS_RDONLY)
IS_NOSUID(inode) ((inode)->i_flags & MS_NOSUID)
IS_NODEV(inode) ((inode)->i_flags & MS_NODEV)
IS_NOEXEC(inode) ((inode)->i_flags & MS_NOEXEC)
IS_SYNC(inode) ((inode)->i_flags & MS_SYNC)
#include <linux/fs.h>

These five test to see if the inode is on a filesystem mounted the corresponding flag.

kfree*()

#define kfree(x) kfree_s((x), 0)
void kfree_s(void * obj, int size)
#include <linux/malloc.h>

Free memory previously allocated with kmalloc(). There are two possible arguments:

obj
Pointer to kernel memory to free.
size
To speed this up, if you know the size, use kfree_s() and provide the correct size. This way, the kernel memory allocator knows which bucket cache the object belongs to, and doesn't have to search all of the buckets. (For more details on this terminology, read mm/kmalloc.c.)

[kfree_s() may be obsolete now.]

Defined in: mm/kmalloc.c, include/linux/malloc.h
See also: kmalloc().

kmalloc()

void * kmalloc(unsigned int len, int priority)
#include <linux/kernel.h>

kmalloc() used to be limited to 4096 bytes. It is now limited to 131056 bytes ((32*4096)-16) on Linux/Intel, and twice that on platforms such as Alpha with 8Kb pages. Buckets, which used to be all exact powers of 2, are now a power of 2 minus some small number, except for numbers less than or equal to 128. For more details, see the implementation in mm/kmalloc.c.

kmalloc() takes two arguments:

len
Length of memory to allocate. If the maximum is exceeded, kmalloc will log an error message of ``kmalloc of too large a block (%d bytes).'' and return NULL.
priority
GFP_KERNEL or GFP_ATOMIC. If GFP_KERNEL is chosen, kmalloc() may sleep, allowing pre-emption to occur. This is the normal way of calling kmalloc(). However, there are cases where it is better to return immediately if no pages are available, without attempting to sleep to find one. One of the places in which this is true is in the swapping code, because it could cause race conditions, and another in the networking code, where things can happen at much faster speed that things could be handled by swapping to disk to make space for giving the networking code more memory. The most important reason for using GFP_ATOMIC is if it is being called from an interrupt, when you cannot sleep, and cannot receive other interrupts.
Returns:
NULL on failure.
Pointer to allocated memory on success.
Defined in: mm/kmalloc.c
See also: kfree()

ll_rw_block()

void ll_rw_block(int rw, int nr, struct buffer_head *bh[])
#include <linux/fs.h>

No device driver will ever call this code: it is called only through the buffer cache. However, an understanding of this function may help you understand the function of the strategy routine.

After sanity checking, if there are no pending requests on the device's request queue, ll_rw_block() ``plugs'' the queue so that the requests don't go out until all the requests are in the queue, sorted by the elevator algorithm. make_request() is then called for each request. If the queue had to be plugged, then the strategy routine for that device is not active, and it is called, with interrupts disabled. It is the responsibility of the strategy routine to re-enable interrupts.

Defined in: devices/block/ll_rw_block.c
See also: make_request(), add_request().

MAJOR()

#define MAJOR(a) (((unsigned)(a))>>8)
#include <linux/fs.h>

This takes a 16 bit device number and gives the associated major number by shifting off the minor number.

See also: MINOR().

make_request()

static void make_request(int major, int rw, struct buffer_head *bh)

This is a static function in ll_rw_block.c, and cannot be called by other code. However, an understanding of this function, as well as an understanding of ll_rw_block(), may help you understand the strategy routine.

make_request() first checks to see if the request is readahead or writeahead and the buffer is locked. If so, it simply ignores the request and returns. Otherwise, it locks the buffer and, except for SCSI devices, checks to make sure that write requests don't fill the queue, as read requests should take precedence.

If no spaces are available in the queue, and the request is neither readahead nor writeahead, make_request() sleeps on the event wait_for_request, and tries again when woken. When a space in the queue is found, the request information is filled in and add_request() is called to actually add the request to the queue. Defined in: devices/block/ll_rw_block.c
See also: add_request(), ll_rw_block().

MINOR()

#define MINOR(a) ((a)&0xff)
#include <linux/fs.h>

This takes a 16 bit device number and gives the associated minor number by masking off the major number.

See also: MAJOR().

memcpy_*fs()

inline void memcpy_tofs(void * to, const void * from, unsigned long n)
inline void memcpy_fromfs(void * to, const void * from, unsigned long n)
#include <asm/segment.h>

Copies memory between user space and kernel space in chunks larger than one byte, word, or long. Be very careful to get the order of the arguments right!

[Caution!] Note: these functions may cause implicit I/O, if the memory being accessed has been swapped out, and therefore pre-emption may occur at this point. Do not include these functions in critical sections of your code, even if the critical sections are protected by cli()/sti() pairs, because implicit I/O will violate the cli() protection. If you need to get at user-space memory, copy it to kernel-space memory before you enter your critical section.

These functions take three arguments:

to
Address to copy data to.
from
Address to copy data from.
n
Number of bytes to copy.

Defined in: include/asm/segment.h
See also: get_user(), put_user(), cli(), sti().

outb(), outb_p()

inline void outb(char value, unsigned short port)
inline void outb_p(char value, unsigned short port)
#include <asm/io.h>

Writes a byte to a port. outb() goes as fast as it can, while outb_p() pauses before returning. Some devices are happier if you don't write to them as fast as possible. Both functions take two arguments:

value
The byte to write.
port
Port to write byte to.

Defined in: include/asm/io.h
See also: inb(), inb_p().

printk()

int printk(const char* fmt, ...)
#include <linux/kernel.h>

printk() is a version of printf() for the kernel, with some restrictions. It cannot handle floats, and has a few other limitations, which are documented in kernel/vsprintf.c. It takes a variable number of arguments:

fmt
Format string, printf() style.
...
The rest of the arguments, printf() style.
Returns:
Number of bytes written.

[Caution!]Note: printk() may cause implicit I/O, if the memory being accessed has been swapped out, and therefore pre-emption may occur at this point. Also, printk() will set the interrupt enable flag, so never use it in code protected by cli(). Because it causes I/O, it is not safe to use in protected code anyway, even it if didn't set the interrupt enable flag.

Defined in: kernel/printk.c.

put_user()

#define put_user(x,ptr) __put_user((unsigned long)(x),(ptr),sizeof(*(ptr)))
#include <asm/segment.h>

Allows a driver to write data in user space, which is in a different segment than the kernel. Derives the type of the arguments and the storage size automatically. This means that you have to use types correctly. Shoddy typing will simply fail to work.

[Caution!] Note: these functions may cause implicit I/O, if the memory being accessed has been swapped out, and therefore pre-emption may occur at this point. Do not include these functions in critical sections of your code even if the critical sections are protected by cli()/sti() pairs, because that implicit I/O will violate the integrity of your cli()/sti() pair. If you need to get at user-space memory, copy it to kernel-space memory before you enter your critical section.

These functions take two arguments:

val
Value to write
addr
Address to write data to.

Defined in: asm/segment.h
See also: memcpy_*fs(), get_user(), cli(), sti().

register_*dev()

int register_chrdev(unsigned int major, const char *name, struct file_operations *fops)
int register_blkdev(unsigned int major, const char *name, struct file_operations *fops)
#include <linux/fs.h>
#include <linux/errno.h>

Registers a device with the kernel, letting the kernel check to make sure that no other driver has already grabbed the same major number. Takes three arguments:

major
Major number of device being registered.
name
Unique string identifying driver. Used in the output for the /proc/devices file.
fops
Pointer to a file_operations structure for that device. This must not be NULL, or the kernel will panic later.
Returns:
-EINVAL if major is >= MAX_CHRDEV or MAX_BLKDEV (defined in ), for character or block devices, respectively.
-EBUSY if major device number has already been allocated.
0 on success.

Defined in: fs/devices.c
See also: unregister_*dev()

request_irq()

int request_irq(unsigned int irq, void (*handler)(int), unsigned long flags, const char *device)
#include <linux/sched.h>
#include <linux/errno.h>

Request an IRQ from the kernel, and install an IRQ interrupt handler if successful. Takes four arguments:

irq
The IRQ being requested.
handler
The handler to be called when the IRQ occurs. The argument to the handler function will be the number of the IRQ that it was invoked to handle.
flags
Set to SA_INTERRUPT to request a ``fast'' interrupt or 0 to request a normal, ``slow'' one.
device
A string containing the name of the device driver, device.
Returns:
-EINVAL if irq > 15 or handler = NULL.
-EBUSY if irq is already allocated.
0 on success.
If you need more functionality in your interrupt handling, use the irqaction() function. This uses most of the capabilities of the sigaction structure to provide interrupt services similar to to the signal services provided by sigaction() to user-level programs.

Defined in: kernel/irq.c
See also: free_irq(), irqaction().

select_wait()

inline void select_wait(struct wait_queue **wait_address, select_table *p)
#include <linux/sched.h>

Add a process to the proper select_wait queue. This function takes two arguments:

wait_address
Address of a wait_queue pointer to add to the circular list of waits.
p p is NULL, select_wait does nothing, otherwise the current process is put to sleep. This should be the select_table *wait variable that was passed to your select() function.

Defined in: linux/sched.h
See also: *sleep_on(), wake_up*()

*sleep_on()

void sleep_on(struct wait_queue ** p)
void interruptible_sleep_on(struct wait_queue ** p)
#include <linux/sched.h>

Sleep on an event, putting a wait_queue entry in the list so that the process can be woken on that event. sleep_on() goes into an uninteruptible sleep: The only way the process can run is to be woken by wake_up(). interruptible_sleep_on() goes into an interruptible sleep that can be woken by signals and process timeouts will cause the process to wake up. A call to wake_up_interruptible() is necessary to wake up the process and allow it to continue running where it left off. Both take one argument:

p
Pointer to a proper wait_queue structure that records the information needed to wake the process.

Defined in: kernel/sched.c
See also: select_wait(), wake_up*().

sti()

#define sti() __asm__ __volatile__ ("sti"::)
#include <asm/system.h>

Allows interrupts to be acknowledged. sti stands for ``SeT Interrupt enable''.

Defined in: asm/system.h
See also: cli().

sys_get*()

int sys_getpid(void)
int sys_getuid(void)
int sys_getgid(void)
int sys_geteuid(void)
int sys_getegid(void)
int sys_getppid(void)
int sys_getpgrp(void)

These system calls may be used to get the information described in the table below, or the information can be extracted directly from the process table, like this:
foo = current->pid;

pidProcess ID
uidUser ID
gidGroup ID
euidEffective user ID
egidEffective group ID
ppidProcess ID of process' parent process
pgidGroup ID of process' parent process

The system calls should not be used because they are slower and take more space. Because of this, they are no longer exported as symbols throughout the whole kernel.

Defined in: kernel/sched.c

unregister_*dev()

int unregister_chrdev(unsigned int major, const char *name)
int unregister_blkdev(unsigned int major, const char *name)
#include <linux/fs.h>
#include <linux/errno.h>

Removes the registration for a device device with the kernel, letting the kernel give the major number to some other device. Takes two arguments:

major
Major number of device being registered. Must be the same number given to register_*dev().
name
Unique string identifying driver. Must be the same number given to register_*dev().
Returns:
-EINVAL if major is >= MAX_CHRDEV or MAX_BLKDEV (defined in <linux/fs.h>), for character or block devices, respectively, or if there have not been file operations registered for major device major, or if name is not the same name that the device was registered with.
0 on success.

Defined in: fs/devices.c
See also: register_*dev()

wake_up*()

void wake_up(struct wait_queue ** p)
void wake_up_interruptible(struct wait_queue ** p)
#include <linux/sched.h>

Wakes up a process that has been put to sleep by the matching *sleep_on() function. wake_up() can be used to wake up tasks in a queue where the tasks may be in a TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state, while wake_up_interruptible() will only wake up tasks in a TASK_INTERRUPTIBLE state, and will be insignificantly faster than wake_up() on queues that have only interruptible tasks. These take one argument:

q
Pointer to the wait_queue structure of the process to be woken.

Note that wake_up() does not switch tasks, it only makes processes that are woken up runnable, so that the next time schedule() is called, they will be candidates to run.

Defined in: kernel/sched.c
See also: select_wait(), *sleep_on()

Copyright (C) 1992, 1993, 1994, 1996 Michael K. Johnson, [email protected].


Messages

14. Question: down/up() - semaphores; set/clear/test_bit() by Erez Strauss
13. Disagree: Bug in printk description! by Theodore Ts'o
12. Question: File access within a device driver? by Paul Osborn
11. None: man pages for reguest_region() and release_region() (?) by [email protected]
10. Question: Can register_*dev() assign an unused major number? by [email protected]
1. Note: Register_*dev() can assign an unused major number. by Reinhold J. Gerharz
9. Question: memcpy_*fs(): which way is "fs"? by Reinhold J. Gerharz
1. Note: memcpy_tofs() and memcpy_fromfs() by David Hinds
8. Note: init_wait_queue() by Michael K. Johnson
7. Question: request_irq(...,void *dev_id) by Robert Wilhelm
1. None: dev_id seems to be for IRQ sharing by Steven Hunyady
6. Idea: udelay should be mentioned by Klaus Lindemann
5. Idea: vprintk would be nice... by Robert Baruch
1. Feedback: RE: vprintk would be nice...
4. Question: add_timer function errata? by Tim Ferguson
1. Ok: add_timer function errata by Tom Bjorkholm
3. Question: Very short waits by Kenn Humborg
2. None: Add the kill_xxx() family to Supporting functions? by Burkhard Kohl
1. News: Allocating large amount of memory by Michael K. Johnson
1. Question: bigphysarea for Linux 2.0? by Greg Hager