nyuRay - A Ray towards Technology: January 2009

Monday, January 12, 2009

Linux - Threads - Mutex

Mutexes:

Mutexes have two basic operations, lock and unlock. If a mutex is unlocked and a thread calls lock, the mutex locks and the thread continues. If however the mutex is locked, the thread blocks until the thread 'holding' the lock calls unlock.

There are 5 basic functions dealing with mutexes.

int

pthread_mutex_init (pthread_mutex_t *mut, const pthread_mutexattr_t *attr);

Note that you pass a pointer to the mutex, and that to use the default attributes just pass NULL for the second parameter.

int

pthread_mutex_lock (pthread_mutex_t *mut);

Locks the mutex :).

int

pthread_mutex_unlock (pthread_mutex_t *mut);

Unlocks the mutex :).

int

pthread_mutex_trylock (pthread_mutex_t *mut);

Either acquires the lock if it is available, or returns EBUSY.

int

pthread_mutex_destroy (pthread_mutex_t *mut);

Deallocates any memory or other resources associated with the mutex.

THREAD 1 THREAD 2

pthread_mutex_lock (&mut);

a = data; /* blocked */

a++; /* blocked */

data = a; /* blocked */

pthread_mutex_unlock (&mut); /* blocked */

b = data;

b--;

data = b;

pthread_mutex_unlock (&mut);

[data is fine. The data race is gone.]

Condition Variables:

To sleep a thread until some condition is satisfied, which other thread will make.

pthread_mutex_lock(&mutex); /* lock mutex */

while (!predicate) { /* check predicate */

pthread_cond_wait(&condvar,&mutex); /* go to sleep - recheck

pred on awakening */

}

pthread_mutex_unlock(&mutex); /* unlock mutex */

when pthread_cond_wait() is called, the mutex is unlocked and the thread goes to sleep waiting for the condition. We need to wake up this thread from another thread

       pthread_mutex_lock(&mutex);         /* lock the mutex       */         predicate=1;                            /* set the predicate    */         pthread_cond_broadcast(&condvar);   /* wake everyone up     */         pthread_mutex_unlock(&mutex);               /* unlock the mutex     */

pthread_cond_broadcast() will wake up all the threads waiting for that condition.

When waken up, pthread_cond_wait() will relock the mutex and do further processing.

List of major pthreads routines

pthread_cancel - cancel running thread
pthread_create - Creates a thread
pthread_cond_broadcast - broadcast that the condition variable is changed to all thereads that are waiting
pthread_cond_destroy - Destroy a condition variable object. The condition variable should be unused, with no threads waiting for it. The memory for the object is left intact; it is up to the caller to deallocate it.
pthread_cond_init - Initialize a condition variable object, using the provided condition attributes object
pthread_cond_signal - Wakeup the highest priority thread waiting on a condition variable.Wakes up only one thread
pthread_cond_wait - The current thread is made to wait until the condition variable is signaled or broadcast. The mutex is released prior to waiting, and reacquired before returning.
pthread_detach - The thread is detached from its parent ( the one which created it.. Here no need to join later ). If the thread has already exited, its resources are released.
pthread_exit - The current thread is terminated, with its status value made available to the parent using pthread_join.
pthread_join - The current thread indicates that it would like to join with the target thread specified by tid. If the target thread has already terminated, its exit status is provided immediately to the caller. If the target thread has not yet exited, the caller is made to wait. Once the target has exited, all of the threads waiting to join with it are woken up, and the target’s exit status provided to each.
pthread_mutex_destroy - The mutex object is destroyed, although the memory for the object is not deallocated. The mutex must not be held.
pthread_mutex_init - Initialize the mutex
pthread_mutex_lock - lock the mutex
pthread_mutex_unlock - unlock the mutex
pthread_mutex_trylock - If we call pthread_mutex_trylock on an unlocked mutex, you will lock the mutex as if you had called pthread_mutex_lock, and pthread_mutex_trylock will return zero. However, if the mutex is already locked by another thread, pthread_mutex_trylock will not block. Instead, it will return immediately with the error code EBUSY.The mutex lock held by the other thread is not affected.You may try again later to lock the mutex.

We have 3 kinds of mutexes

fast mutex (by default all mutex are of this type)

recursive mutex

error checking flag mutex

if we call pthread_mutex_lock on a fast mutex two times in the same thread without unlocking, this will cause deadlock.

A recursive mutex maysafely be locked many times by the same thread.The mutex remembers howmany times pthread_mutex_lock was called on it by the thread that holds the lock; that thread must make the same number of calls to pthread_mutex_unlock before the mutex is actually unlocked and another thread is allowed to lock it.

By default, Linux mutexes are fast mutexes. We can modify it by using mutex attributes as shown below

pthread_mutexattr_t attr;

pthread_mutex_t mutex;

pthread_mutexattr_init (&attr);

pthread_mutexattr_setkind_np (&attr, PTHREAD_MUTEX_RECURSIVE_NP
); // we can use PTHREAD_MUTEX_ERRORCHECK_NP for error cheking flags

pthread_mutex_init (&mutex, &attr);

pthread_mutexattr_destroy (&attr);

Friday, January 9, 2009

Linux - Threads-1

Threads, like processes are a mechanism to allow a program to do more than one thing at a time.

Linux schedules threads asynchronously, interrupting each thread from time to time to give other a chance to execute.

Threads exists within a process. When we invoke a program, Linux creates a new process and in that process creates a single thread, which runs sequentially. That thread can create additional threads; all these threads run the same program in the same process, but each thread may be executing a different part of the program at any given time.

fork() creates child process with its parent's virtual memory, file descriptors and so on copied. Where as in threads, When a program creates another thread, nothing is copied. The created thread share the same memory space, file descriptors, and other system resources as the original. If a thread changes the value of a variable, the other thread subsequentially will see the modification.

If any thread inside a process calls one of the exec functions, all the other threads are ended.

Linux implements the thread API (knows as pthreads)
All thread functions and data types are declared in file /pthread.h/

The thread functions are not in the standard C library, instead they are in libpthread, so we should add -lpthread to the command line when we link our program.

Each thread is identified by thread ID. Data type is pthread_t

Upon thread creation, each thread executes a thread_function. This is an ordinary function and contains the code that thread should run. When the function returns, the thread exits.

pthread_create() function creates a new thread.

A call to pthread_create() returns immediately, and the original thread continues executing the instructions following the call.

If successful, the pthread_create() function returns zero. Otherwise, an error number is returned to indicate the error.

ex:
pthread_create(&pid, NULL, &print_x, NULL);
/* pid - thread id of type pthread_t
NULL (2nd parameter) - specifies the thread attributes. NULL is for default attributes
&print_x - 3rd parameter is the pointer to the function that the thread should run
NULL - 4th parameter is the pointer to the parameters that has to be passed to the thread calling funciton. */

Generally, all the parameters required by the function that the thread should call are placed in a structure and the pointer to the structure is passed when the thread is created.

Ex:
#include/pthread.h/
#include/stdio.h/

struct char_print_params // structure for passing params to thread func
{
char c;
int count;
};

void* char_print(void *params) // thread func
{
struct char_print_params *p = (struct char_print_params*)params;
int i;
for(i=0;icount;i++)
fputc(p->c, stdout);
return NULL;

}
int main() {
pthread_t pid1,pid2;
struct char_print_params c1,c2;

c1.c = 'x';
c1.count = 200;

c2.c = 'o';
c2.count = 300;

pthread_create(&pid1, NULL, &char_print, &c1); // create thread 1 to print 'x'
pthread_create(&pid2, NULL, &char_print, &c2); // create thread 2 to print 'o'

return 0;
}

Joining the threads:

Force the main() to wait for the thread to finish.
pthread_join() will be used to wait for the thread. it takes 2 params, one is the thread id, 2nd is the pointer to take the return value of thread. NULL is passed if we don't need the return value.

Ex:
pthread_join(pid1,NULL);

A thread cannot call pthread_join() to join itself.

** Make sure that any data you passed to the thread by reference is not deallocated, even by a different thread, until you are sure that the thread is done with it. This is true for both local variables and global variables.

pthread_self() - return the thread ID of the thread in which this function is called.

pthread_equal() - compares the thread ids.

pthread_cancel() - We can cancel the thread using this function. Later, we should join the cancelled thread to free up the resources. An asynchronously cancellable thread can be cancelled at any point in its execution. Where as a synchronously cancellable thread may be cancelled only at particular places in its execution. These places are called as cancelling points.

pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) - This disables the cancellation of the thread.

Critical Sections:
A critical section is a sequence of code that must be executed in its entirity or not at all; in other words, if a thread begins executing the critical section, it must continue until the end of the critical section without being cancelled.

Thursday, January 8, 2009

Linux - Processes

Creating Processes:

fork() - Duplicate the process. Returns pid of the child process to the parent. Returns zero to the child process.

exec() - replace the program running in a process with another program. Used to run programs from with in another program.

execvp and execlp - accepts program name and search for a program by that name in the current execution path. For exec() we need to provide the complete path

nice - to assign priority to a process. By default every process has niceness of Zero. Higher nice value means, lower priority and vice versa.

=>nice -n 10 sort 1.txt > 2.txt//will assign a nice value of 10 to the command "sort 1.txt >2.txt"

Signals:

Signal is a special message sent to a process. Signals are asynchronous; when a process receives a signal, it processes the signal immediately

Signal type is specified by a signal number, in programs we usually use signal name.

These are defiened in /usr/include/bits/signum.h. In our programs we include /signal.h/

Program contains signal-handler functions to process the signals.

ex:

SIGBUS - bus error

SIGSEGV - segmentatin fault

SIGFPE - floating point exception

A process may also send a signal to another process. One common use of this is

SIGTERM or SIGKILL

SIGTERM - ask the process to terminate. The process may ignore the request by masking or ignoring the signal.

SIGKILL - always kills the process immediately because the process may not mask or ignore SIGKILL.

Signal handling function:

sigaction function can be used to set a signal disposition. The first parameter is the signal number. The next two parameters are pointers to sigaction structures. The fist of these contains the desired disposition for that signal number, while the second receives the previous disposition. The most important field in the first or second sigaction structure is sa_handler . It takes one of the three values.

- SIG_DFL - default disposition for the signal

- SIG_IGN - signal should be ignored

- A pointer to signal-handler function. The function should take one parameter, ie., signal number and return void.

Signal handler should perform minimum work necessary to respond to the signal and then return control to the main program. A signal handler may be interrupted by another signal. So we need to be careful while placing code in the signal handler. Even assigning a value to a global variable can be dangerous as the assignment may actually be carried out in 2 or more machine instructions and a second signal may occur between them, leaving the variable in a corrupted state.

If we use global variable to flag a signal in a signal-handler function , it should be of special type sig_atomic_t . Linux guarantees the assignment of these variables is carried out in a single instruction.

Ex:

----

kill(child_pid, SIGTERM) - to kill the child from the parent. Include the sys/type.h and signal.h if we want to use kill function.

Wait - Wait system call allow us to wait for a process to finish executing, and enable the parent process to retrieve information about its child's termination.

wait(&child_status) - Waits for the child to exit and gets the return value in child_status

waitpid(child_pid) - Waits for child with pid = child_pid to exit.

wait3 - returns the cpu usage statistics about the exiting child process

wait4 - allows us to specify additional options about which process to wait for.

Zombie processes - a process that has terminated but has not been cleaned up yet.

This happens when a child process teminates, when the parent is not calling wait(). It is the responsibility of the parent process to clean up its zombie children.

When the parent exits without cleaning the zombie child, this child is inherited by the init process (pid =1 ) and init cleans up all the zombie process it inherits

When a child terminates, Linux sends a SIGCHLD signal to the parent process. The signal handler can be modified to clean up the child processes.

Saturday, January 3, 2009

Compilers, Assemblers, Linkers

Regardless of OS, the program has to pass 4 stages to make a running executable

Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.

Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.

Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.

Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation)

ELF - (Executable and Linking Format) is file format that defines how an object file is composed and organized. With this information, your kernel and the binary loader know how to load the file, where to look for the code, where to look the initialized data, which shared library that needs to be loaded and so on.

Different sections in an executable format are:

Section

Description

.text

This section contains the executable instruction codes and is shared among every process running the same binary. This section usually has READ and EXECUTE permissions only. This section is the one most affected by optimization.

.bss

BSS stands for ‘Block Started by Symbol’. It holds un-initialized global and static variables. Since the BSS only holds variables that don't have any values yet, it doesn't actually need to store the image of these variables. The size that BSS will require at runtime is recorded in the object file, but the BSS (unlike the data section) doesn't take up any actual space in the object file.

.data

Contains the initialized global and static variables and their values. It is usually the largest part of the executable. It usually has READ/WRITE permissions.

.rdata

Also known as .rodata (read-only data) section. This contains constants and string literals.

.reloc

Stores the information required for relocating the image while loading.

Symbol table

A symbol is basically a name and an address. Symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index. The symbol table contains an array of symbol entries.

Relocation records

Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. Re-locatable files must have relocation entries’ which are necessary because they contain information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Simply said relocation records are information used by the linker to adjust section contents.

Good link describing the use of readelf and object dump:

Process Loading:

Before we can run an executable, firstly we have to load it into memory.

This is done by the loader, which is generally part of the operating system. The loader does the following things (from other things):

Memory and access validation - Firstly, the OS system kernel reads in the program file’s header information and does the validation for type, access permissions, memory requirement and its ability to run its instructions. It confirms that file is an executable image and calculates memory requirements.

Process setup includes:

Allocates primary memory for the program's execution.

Copies address space from secondary to primary memory.

Copies the .text and .data sections from the executable into primary memory.

Copies program arguments (e.g., command line arguments) onto the stack.

Initializes registers: sets the esp (stack pointer) to point to top of stack, clears the rest.

Jumps to start routine, which: copies main()'s arguments off of the stack, and jumps to main().

Address space is memory space that contains program code, stack, and data segments or in other word, all data the program uses as it runs.

The memory layout, consists of three segments (text, data, and stack), in simplified form is shown in Figure w.5.

The dynamic data segment is also referred to as the heap, the place dynamically allocated memory (such as from malloc() and new) comes from. Dynamically allocated memory is memory allocated at run time instead of compile/link time.

This organization enables any division of the dynamically allocated memory between the heap (explicitly) and the stack (implicitly). This explains why the stack grows downward and heap grows upward.

Before we can run an executable, firstly we have to load it into memory.

This is done by the loader, which is generally part of the operating system. The loader does the following things (from other things):

Memory and access validation - Firstly, the OS system kernel reads in the program file’s header information and does the validation for type, access permissions, memory requirement and its ability to run its instructions. It confirms that file is an executable image and calculates memory requirements.

Process setup includes:

Allocates primary memory for the program's execution.

Copies address space from secondary to primary memory.

Copies the .text and .data sections from the executable into primary memory.

Copies program arguments (e.g., command line arguments) onto the stack.

Initializes registers: sets the esp (stack pointer) to point to top of stack, clears the rest.

Jumps to start routine, which: copies main()'s arguments off of the stack, and jumps to main().

Address space is memory space that contains program code, stack, and data segments or in other word, all data the program uses as it runs.

The memory layout, consists of three segments (text, data, and stack), in simplified form is shown in Figure w.5.

The dynamic data segment is also referred to as the heap, the place dynamically allocated memory (such as from malloc() and new) comes from. Dynamically allocated memory is memory allocated at run time instead of compile/link time.

This organization enables any division of the dynamically allocated memory between the heap (explicitly) and the stack (implicitly). This explains why the stack grows downward and heap grows upward.

Run time data structure: From sections to segments

A process is a running program. This means that the operating system has loaded the executable file for the program into memory, has arranged it to have access to its command-line arguments and environment variables, and has started it running.

Typically a process has 5 different areas of memory allocated to it as listed in Table w.5 (refer to Figure w.4):

Segment

Description

Code - text segment

Often referred to as the text segment, this is the area in which the executable instructions reside. For example, Linux/Unix arranges things so that multiple running instances of the same program share their code if possible. Only one copy of the instructions for the same program resides in memory at any time. The portion of the executable file containing the text segment is the text section.

Initialized data – data segment

Statically allocated and global data that are initialized with nonzero values live in the data segment. Each process running the same program has its own data segment. The portion of the executable file containing the data segment is the data section.

Uninitialized data – bss segment

BSS stands for ‘Block Started by Symbol’. Global and statically allocated data that initialized to zero by default are kept in what is called the BSS area of the process. Each process running the same program has its own BSS area. When running, the BSS data are placed in the data segment. In the executable file, they are stored in the BSS section. For Linux/Unix the format of an executable, only variables that are initialized to a nonzero value occupy space in the executable’s disk file.

Heap

The heap is where dynamic memory (obtained by malloc(), calloc(), realloc() and new for C++) comes from. Everything on a heap is anonymous, thus you can only access parts of it through a pointer. As memory is allocated on the heap, the process’s address space grows. Although it is possible to give memory back to the system and shrink a process’s address space, this is almost never done because it will be allocated to other process again. Freed memory (free() and delete) goes back to the heap, creating what is called holes. It is typical for the heap to grow upward. This means that successive items that are added to the heap are added at addresses that are numerically greater than previous items. It is also typical for the heap to start immediately after the BSS area of the data segment. The end of the heap is marked by a pointer known as the break. You cannot reference past the break. You can, however, move the break pointer (via brk() and sbrk() system calls) to a new position to increase the amount of heap memory available.

Stack

The stack segment is where local (automatic) variables are allocated. In C program, local variables are all variables declared inside the opening left curly brace of a function body including the main() or other left curly brace that aren’t defined as static. The data is popped up or pushed into the stack following the Last In First Out (LIFO) rule. The stack holds local variables, temporary information, function parameters, return address and the like. When a function is called, a stack frame (or a procedure activation record) is created and PUSHed onto the top of the stack. This stack frame contains information such as the address from which the function was called and where to jump back to when the function is finished (return address), parameters, local variables, and any other information needed by the invoked function. The order of the information may vary by system and compiler. When a function returns, the stack frame is POPped from the stack. Typically the stack grows downward, meaning that items deeper in the call chain are at numerically lower addresses and toward the heap.

Table w.5

When a program is running, the initialized data, BSS and heap areas are usually placed into a single contiguous area called a data segment.

The stack segment and code segment are separate from the data segment and from each other as illustrated in Figure w.4.

Although it is theoretically possible for the stack and heap to grow into each other, the operating system prevents that event.

The relationship among the different sections/segments is summarized in Table w.6, executable program segments and their locations.

Executable file section

(disk file)

Address space segment

Program memory segment

.text

Text

Code

.data

Data

Initialized data

.bss

Data

BSS

-

Data

Heap

-

Stack

Stack

The Process:

The diagram below shows the memory layout of a typical C’s process. The process load segments (corresponding to "text" and "data" in the diagram) at the process's base address.

The main stack is located just below and grows downwards. Any additional threads or function calls that are created will have their own stacks, located below the main stack.

Each of the stack frames is separated by a guard page to detect stack overflows among stacks frame. The heap is located above the process and grows upwards.

In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory.

It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager will extract the name of the dynamic interpreter from the program header.

The dynamic interpreter points to a shared library that contains the runtime linker code. The process manager will load this shared library in memory and will then pass control to the runtime linker code in this library.

C reference 3

Union - A derived data type, whose members share the same storage space. The members of a union can be of any type and the number of bytes used to store a union must be at least enough to hold the largest member.

Only a value with same type of the first union member can be used to initialize union in declaration part. For example:
union sample
{
int p;
float q;
};
...
...
union sample content = {234};
union sample content = {23.44}; // is invalid.

String reverse function using recursion:

#include/stdio.h/
void reverse(char *);

int main() {
char c[20];
char *x;
x= gets(c);
reverse(x);
printf("\n");
return 0;
}

void reverse(char *s)
{
if(s[0] == '\0')
return;
else
{
reverse(&s[1]);
putchar(s[0]);
}
}

Function prototype	Function description
int getchar(void)	Input the next character from the standard input (keyboard) and return it as an integer.
char gets(char s)	Input characters from the standard input (keyboard) into the array s until a newline or end-of-file character is encountered. A terminating NULL character is appended to the array.
int putchar(int c)	Print the character stored in c.
int puts(const char *s)	Print the string s followed by a newline character.
int sprintf(char s, const char format, …)	Equivalent to printf() except the output is stored in the array s instead of printing on the screen.
int sscanf(char s, const char format, …)	Equivalent to scanf() except the input is read from the array s instead of reading from the keyboard.

strtok() -

char *strtok(char *s1, const char *s2)

A sequence of calls to strtok() breaks string s1 into “tokens”, logical pieces such as words in a line of text, separated by characters contained in string s2. The first call contains s1 as the first argument, and subsequent calls to continue tokenizing the same string contain NULL as the first argument. A pointer to the current token is returned by each call. If there are no more tokens when the function is called, NULL is returned.

Memory Functions:

Function prototype	Function description
void memcpy(void s1, const void *s2, size_t n)	Copies n characters from the object pointed to by s2 into the object pointed to by s1. A pointer to the resulting object is returned.
void memmove(void s1, const void *s2, size_t n)	Copies n characters from the object pointed to by s2 into the object pointed to by s1. The copy is performed as if the characters are first copied from the object pointed to by s2 into temporary array, then from the temporary array into the object pointed to by s1. A pointer to the resulting object is returned.
int memcmp(const void s1, const void s2, size_t n)	Compares the first n characters of the objects pointed to by s1 and s2. The function return 0, less than 0, or greater than 0 if s1 is equal to, less than, or greater than s2.
void memchr(const void s, int c, size_t n)	Locates the first occurrence of c (converted to unsigned char) in the first n characters of the object pointed to by s. If c is found, a pointer to c in the object is returned. Otherwise NULL is returned.
void memset(void s, int c, size_t n)	Copies c (converted to unsigned char) into the first n characters of the object pointed to by s. A pointer to the result is returned.

Dynamic Memory Allocation:

- Dynamic memory is allocated on the heap.

Disk file segments

For more details: http://www.tenouk.com/ModuleZ.html

Heap

- The heap segment provides more stable storage of data for a program; memory allocated in the heap remains in existence for the duration of a program.

- global variables (external storage class), and static variables are allocated on the heap.The memory allocated in the heap area, if initialized to zero at program start, remains zero until the program makes use of it. Thus, the heap area need not contain garbage.

malloc() :

char * test;

test = (char *) malloc(10); // allocates 10 bytes of memory

calloc():

int * test;

test = (int *) calloc(5, sizeof(int)); // allocates 5* 4 (size of int) bytes

- Initializes the allocated memory elements to zero.

realloc():

reallocates the already allocated memory

void * realloc (void * pointer, size_t elemsize);

Friday, January 2, 2009

C reference 2

Some complex pointer declarations:

int *x
x is a pointer to int data type.

int *x[10]
x is an array[10] of pointer to int data type.

int *(x[10])
x is an array[10] of pointer to int data type.

int **x
x is a pointer to a pointer to an int data type – double pointers.

int (*x)[10]
x is a pointer to an array[10] of int data type.

int *funct()
funct is a function returning an integer pointer.

int (*funct)()
funct is a pointer to a function returning int data type – quite familiar constructs.

int (*(*funct())[10])()
funct is a function returning pointer to an array[10] of pointers to functions returning int.

int (*(*x[4])())[5]
x is an array[4] of pointers to functions returning pointers to array[5] of int.

* - is a pointer to

[ ] - is an array of

( ) - is a function returning

& - is an address of

File Pointers:
reading from one file and writing to another file

#include

int main(int argc, char **argv) {
int i;
char buf[100];
FILE *fin, *fout;
fin = fopen("text1.txt","r+");
fout = fopen("text2.txt","w+");
if(fin == NULL)
{
printf("Couldn't open input file \n");
exit(1);
}
if(fout == NULL)
{
printf("Couldn't open output file \n");
exit(1);
}

while (fgets(buf,100,fin) != NULL) {
fputs(buf, fout);
printf("%s", buf);
}
fclose(fin);
fclose(fout);
return 0;
}

fgets,fputs - read and write data line by line.. if we want to read / write block of data then we need code like below:

while (!feof(fin)) // feof will return 0 if end of file is reached.
{
// Reading
i = fread(buf, sizeof(char), LEN, fin);

/* read data from buf, of size of char each element, LEN number of bytes to be read, 'i' - actual number of bytes read */

buf[i*sizeof(char)] = '\0';
printf("%s", buf);

// Writing
fwrite(buf, sizeof(char), i, fout);
/* writes data from buf, of size of char each element, i number of bytes to written , returns the number of bytes actual written */

}

Random access:
fseek(), ftell() deals with random access to a file.

fseek() - will move the file position indicator the place where we need it.

int fseek(FILE *stream, long offset, int whence);

whence =
SEEK_SET - File begining
SEEK_CUR - Current file pointer position
SEEK_END - End of file

offset - number of bytes from the "whence"

fseek() returns "0" if it is successful or non-zero if it fails.

- if SEEK_END is selected, the offset should be -ve.

We can obtain the current value of the position indicator by ftell()
long ftell(FILE *stream);

ftell() returns the value of the current position indicator. (i.e., number of bytes from the begining to the current position indicator.

ex:
#include
#define LEN 100

void ptrSeek(FILE *fptr);
void DataRead(FILE *fptr);

int main(int argc, char **argv) {
int i;
char buf[LEN];
FILE *fin, *fout;
long offset1,offset2,offset3,offset4;

fin = fopen("text1.txt","r+");
if(fin == NULL)
{
printf("Couldn't open input file \n");
exit(1);
}
while (!feof(fin)){
offset1 = ftell(fin);
printf("Offset = %ld\n",offset1);
fgets(buf, LEN, fin);
printf("==> %s\n", buf);
}

fclose(fin);
return 0;
}

rewind() - can be used to rewind the file position indicator.
rewind(fptr); is equivalent to (void) fseek(fptr, 0L, SEEK_SET);
Example for writing and then reading from the same binary file:

#include /stdio.h/
#define LEN 100
#define MAX_NUM 5

void DataWrite(FILE *fptr);
void DataRead(FILE *fptr);

int main(int argc, char **argv) {
int i;
char buf[LEN];
FILE *fin, *fout;
long offset1,offset2,offset3,offset4;

fin = fopen("text1.bin","wb+");
if(fin == NULL)
{
printf("Couldn't open input file \n");
exit(1);
}

DataWrite(fin);
rewind(fin);
DataRead(fin);

fclose(fin);
return 0;
}

void DataWrite(FILE *fout)
{
int i;
double buf[MAX_NUM] = {123.44, 345.53, 134.12, 865.45, 454.34};
for (i=0; i/MAX_NUM; i++)
{
printf("%5.2f\n",buf[i]);
fwrite(&buf[i], sizeof(double), 1, fout);
}
}

void DataRead(FILE *fin)
{
int i;
double x;
for (i=0; i/MAX_NUM; i++)
{
fread(&x,sizeof(double),(size_t)1, fin);
printf("%5.2f\n",x);
}
}

fscanf() and fprintf() allows the programmer to specify the I/O streams.

ex:
fscanf(fileptrIn, "%i", &value);
fprintf(fileptrOut, "Average of %i numbers = %f \n", count, total/(double)count);

if successful fscanf() returns the number of data items read. If fails, it returns EOF.
fprintf() returns the number of formatted expressions, else it returns -ve value.

remove(filename) - Delete the file with name "filename"
rename(oldname, newname) - Rename a file

rmdir() - Removes a directory.
mkdir() - Creates a directory

Pre-processor directives:
#error - prints the error message including the tokens specified

ex:

#if (MyVAL != 2 && MyVAL != 3)
#error MyVAL must be defined to either 2 or 3
#endif

#pragma - ??

# - argument to marcro causes a replacement text token to be converted to a string surrounded by double quotes

ex:
#define HELLO(x) printf("Hello, " #x "\n");
causes Hello(John) expanded to printf("Hello, " "John" "\n");

## - concatenate two arguments.
#define CAT(p, q) p ## q .
CAT(O,K) replaced as OK in the program.

#line - preprocessor directive causes the subsequent source code lines to be renumbered starting with the specified constant integer value. The directive:
#line 100 - Starts line numbering from 100 beginning with the next source code line.

Pre defined macros:

__DATE__ - The date the source file is compiled (a string of the form "mmm dd yyyy" such as "Jan 19 1999").

__LINE__ - The line number of the current source code line (an integer constant).

__FILE__ - The presumed names of the source file (a string).

__TIME__ - The time the source file is compiled (a string literal of the form :hh:mm:ss).

__STDC__ - The integer constant 1. This is intended to indicate that the implementation is ANSI C compliant.

nyuRay - A Ray towards Technology