Work on the Server Machines

About Servers

Massive tasks are usually executed in specially designed machines, called “server machine”. Those machines are usually visited in remote, and shared by many users. To serve all the users, and let any of them feel like they own the machine (i.e., do not feel the existence of other users), the machine starts a program (or more precisely, a process) for each of its user. Such program is called a “server” - i.e., a server is a software running on the machine, but it is not the machine itself.

There are a number of servers that we frequently used:

  • The shell: the surface above the operating system kernel. It provides abstract functionality to the user, hide details about how the operating system works.

  • The kernel of the operating system: the core functionality set of the system. Typically examples are “Linux”, “Darwin”, and “WindowsNT”. Ordinary users do not touch the kernel directly. They use those functionality by the shell or by the GUI of the system. Application programmers, use the “system calls”, i.e., a set of functions provided in the programming languages, to interact with the kernel. System developers directly modify the kernel, by either writing kernel modules or changing existing codes of the kernel.

  • The HTTP server: serves a website. For example, when you visit https://lig.astro.tsinghua.edu.cn/, the server of LIG returns a page, a set of style files and script files, to your browser. The browser renders them to the web page you see.

This talk introduces a set of specific functionality provided by the “shell” that are relevant to your daily access to the server machine remotely. A more complete guide on the shell itself will be presented in the section Shell.

ssh - Linking to the Server

To start a shell remotely, run ssh command in you local computer:

$ ssh [-p Port] RemoteAddress

Examples:

$ ssh 166.111.131.54
$ ssh hubble@166.111.131.54
$ ssh -p 2333 hubble@166.111.131.54

The first use an IP(v4) address as the RemoteAddress. If the server machine is not public, ssh will ask your user name and password. The second specify an user name. The third specify a Port number, which is defined by the ssh host in the server machine, and you must ask the system administrator for its detail. By default, the Port number is “22”.

Using ssh Without a Password

Type the password everytime is boring. But to ensure the safety, it is a necessary step. The solution is to let the local SSH command automatically provide information to the remote for authentication.

First, enter the directory ~/.ssh/ to see whether there is two files named id_rsa.pub and id_rsa. If not, generate them by:

$ ssh-keygen -t rsa [-C YourEmail]

For example:

$ ssh-keygen -t rsa -C 'hubble@cis.edu'

The email is optional.

The file id_rsa.pub is a public key that tells people or the server machine who you are. You may give it to other people. the file id_rsa is a private key that verifies you are the person named by the public key. Please always keep the private key secret.

Now, upload the public key to the remote machine, simply by running:

$ ssh-copy-id [-p Port] -i ~/.ssh/id_rsa.pub RemoteAddress

For example:

$ ssh-copy-id -p 2333 -i ~/.ssh/id_rsa.pub hubble@166.111.131.54

Now, you can use SSH to login into the remote machine without a password. Other commands that use SSH as data transfering protocol also do not need a password.

Note that some machines may turn off the auto-authentication. In those machines, a password is necessary.

Access to Remote Files

scp - Copying File From/To the Remote

To move files between two machines, simply run the scp (copy-with-ssh) command:

$ scp [-P Port] [-r] SourceAddress DestAddress

Where the “SourceAddress” and “DestAddress” can be a local or a remote file or directory. If “SourceAddress” is a directory, like foo or foo/, you must use -r to recursively copy its files and sub-directories.

For example, the local files in the current directory are:

foo/
  |- bar.txt
  |- baz.py

You may copy them to a server machine by

$ scp -r foo hubble@166.111.131.54:/home/hubble

The directory foo and its contents are copied to under the remote /home/hubble directory.

Sometimes a large copy requires a long time to finish. In such case, simply run scp in the background:

$ scp -r foo hubble@166.111.131.54:/home/hubble

# Enter the password if asked.
# Then press Ctrl+Z to stop scp and release the shell prompt.

$ bg

The command bg puts the task in the background so that you can you the shell for other tasks.

Note that scp always copies all contents, no matter whether the “DestAddress” already has them or not. The following command, rsync, on the other hand, compares local and remote files, and copies only if necessary.

rsync - Synchronizing With Remote File System

rsync is a more convenient command for transfering big data or large amounts of files. It is also a good method to make local back-up for local files. To use rsync, run:

$ rsync [Option]... SourceAddress DestAddress

Take the above local directory foo/ as an example, to synchronize it to a server machine, run:

$ rsync -azP foo hubble@166.111.131.54:/home/hubble

The options are:

  • -a: a combination of a set of options, means make “archive”. This enables recursive transfering for directory, preserve symbolic links, modification times, and user permission masks. This is usually a good choice.

  • -z: compress the data before transfering them. This is usually also a good choice, because it reduces the amounts of data to transfer.

  • -P: the combination of -progress and partial. The first means a progress bar is shown, which allows visual inspection of the progress. The second means a local or remote temporary record is made before finishing the transfering. If the task is interupted for some reason, the later invode of rsync will resume the un-finished job. This is a good choice to transfer big data files.

Without any modification, the second time you run rsync for the same directory, no transfering will be made because rsync only synchronizes those changed files.

Note that the directory argument foo is different from foo/. The previous puts the directory foo as well as its contents to the remote folder /home/hubble, while the later moves only its contents.

If a port number is required, you may pass an additional option, like $ rsync -azP -e 'ssh -p Port' ....

In some cases, you want just to synchronize a part of the directory, and ignore some of the files. In such case, use option --exclude='Pattern' or --exclude={'Pattern1','Pattern2', ...} to exclude them. For example:

$ rsync -azP --exclude={'bar.txt','baz.py'} foo hubble@166.111.131.54:/home/hubble

If lots of files need to be excluded, just create a new file, e.g., called rsync-excl.txt, write down all the patterns into it, one per line, and specify the exclusion as --exclude-from=rsync-excl.txt.

rsync allows double-check of the contents before real synchronization:

$ rsync -azP -nv foo hubble@166.111.131.54:/home/hubble

Here the additional options are

  • -n: dry-run. With this option, rsync prints the files to be transferred, but does not really do it.

  • -v: verbose, which allow detailed information to print.

With both -n and -v options, you can check the output:

sending incremental file list
foo/
foo/bar.txt
foo/baz.py

sent 122 bytes  received 55 bytes  354.00 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

If correct, you may drop the -nv and make actual synchronization.

sshfs - Mounting Remote File System to Local

If you want to visit the remote files like that they are in the local file system, but you do not want any copy, sshfs can help. This commands simply “mounts” a remote directory into the local file system, so that it appears in the local file tree:

$ mkdir tempdir
$ sudo sshfs -o allow_other,IdentityFile=/home/edwin/.ssh/id_rsa hubble@166.111.131.54:/home/hubble ./tempdir

The first command creates a local directory as the mounting point. It must be empty or otherwise sshfs rejects the mounting. The second command needs super-user privilege, which can be authorized by sudo. Through -o you can pass one or more comma-separated options. Here allow_other means that you can use remote files owned by other users. IdentityFile points to the authentication file we created in Using ssh Without a Password. If omitted, you will be asked for a password later. The finally two positional arguments are just like scp, the source directory and the mounting point.

After the mounting, you can use remote files just like they are in the local system. You may edit them, and the changes are reflected to the remote. But depending on the network performance, it might be slow.

To un-mount the remote file system, run:

$ sudo umount ./tempdir

git - Synchronize your Code Repository

Codes of human-readable text files are usually organized as git repositories. Git also provides functionality to synchronize those repositories between computers.

Suppose that you have a local git repository created by the way described in the section Git and Github. To synchronize it to a remote machine, create an “empty” git repository on your desired location in the remote machine. For example:

mkdir repo-name && cd repo-name
git init

To enable a push to this repository, run the following configuration commands in this repository:

git config receive.denyCurrentBranch ignore
git config --bool receive.denyNonFastForward false

In your local repository, bind the remote repository by:

git remote add origin hubble@166.111.131.54:/home/hubble/repo-name/.git

Here, origin may be other arbitrary name for the remote repository. Then, any local change can be synchronized to remote by push:

git push origin master

In the remote repository, run reset to see the changes:

git reset --hard

Hint

The above git configurations and reset operations are dangerous when the modifications are made both remotely and locally. Refer to their manuals for detail.

wget - Download File From HTTP Server

The above commands are all ssh-based. They work fine for a private remote machine, i.e., you must have a password to fetch the data from it.

Some resources, on the other hand, are made public available by HTTP servers. Commercial sites, like GoogLe, Amazon, Taobao, all fit into this category. To get those resources, the simplest way is to use your browser, like Chrome, Firefox, and Edge. But some resources are distributed at many different locations, therefore requiring automated script to download them. wget is provided for this purpose.

The simplest usage for wget is to download a webpage, like HTML document. For example, to download the index page of LIG - an astrophysics team in China, run:

$ wget https://lig.astro.tsinghua.edu.cn/

This is equivalent to ask the LIG server “hey, give me the index document under the path https://lig.astro.tsinghua.edu.cn/”. The HTTP server responses to your query, and returns to wget what you asked - a index.html file.

To download any specific resource from a HTTP server, run:

$ wget [Option]... URL...

Here URL means Unified Resource Locator, which we usually called web address.

wget also allows recursive downloading, i.e., downloads all files under a page, with user-specified criteria. For more detailed usage, see its manual page by running $ man wget.