Orientation Materials

New Students Please Read the Following Information

How to Google (Google, just Google it)

Rule of thumb: Use ENGLISH to search everything, get rid of any Chinese keyword.

How to read / write paper

  • How to read paper

  • Google scholar

    • Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, …
    • https://scholar.google.com/
  • Zotero

    • Help you manage your paper, you can put the paper your find in zotero to manage them
    • Please also install the Better Bibtex for Zotero extension (https://retorque.re/zotero-better-bibtex/) to automatically generate the citation keys. Please change the citation key foramt into: [auth:lower][year][veryshorttitle:lower] . 
    • https://www.zotero.org/

  • Visual Studio Code

    • vscode is a text editor. You can use it to write your paper and code
    • Note that vscode is a text editor, so you need to install compiler for your code if you want to compile your code
    • https://code.visualstudio.com/
  • LaTex / BibTex

    • latex is the language for writing paper. You can use overleaf to get used to it
    • overleaf: https://www.overleaf.com/
    • You need to install latex compiler if you want to compile your latex code on your computer
    • Latex compiler: https://www.latex-project.org/get/
    • For convince,
      • Windows, linux: TeX Live
      • Mac: MacTeX
    • Note that different online libraries provide bibtex entries in different styles, which lead to terrible inconsistency. We have a convention way to unify the bibtex entries. Please strictly follow the tutorial: bib File Writing Toturial.
  • SVN

    • subversion (svn) is a version control software. In our lab, we use it to do the version control for data of papers.
    • our svn server: svn://snoopy.cs.nthu.edu.tw/nmsl and svn://snoopy.cs.nthu.edu.tw/ext
    • you need to install svn client to access data on svn server
    • We recommend svn tortoise for windows. It have GUI, so you use it easily if you are a beginner
    • For mac and linux system, you can install svn using command line
    • Tutorial 
      • Linux: sudo apt install subversion

      • Mac: brew install svn

  • Inkscape

    • https://inkscape.org/
    • We use inkscape to draw figures for papers.
    • The reason we don’t use inkscape rather than ppt is the output of inkscape is .svg or .eps, which are format of Vector graphics. The Vector graphic is better than normal figure because it can provide you high visual quality when you zoom in. (like the following figure)

  • Evaluation guideline

The hint wrote by ChengHsin

The evaluation design is crucial for we systems folks. I used to teach each of you individually when you are working on your research papers.

This morning, I planned to put up a more comprehensive evaluation design for a paper we submitted to ICDCS in January. I then decided to write down the hints and ask the student author to redo her evaluation design instead. 🙂

Please find my notes attached. This is a mandatory reading material; please finish it, and discuss with me if you don’t understand any parts of it.

Hopefully, we will save some trial-and-error time when you work on your papers shortly.

Coding

  • Python

     
    • Jupyter Notebook 

Jupyter Notebook is an open-source web application. You can program, apply all visualization package, and even make a tutorial documents on Jupyter Notebook as long as using Python. Jupyter Notebook is very convenient when you want to quickly realize and demo your code including data cleaning, numerical simulation, machine learning, etc.


Installation & run:

      • pip install jupyter

      • jupyter notebook

The jupyter notebook will be launched on localhost by default.

How to use:

      • We are coding in the cell, which is the selected box area in the following figure. We can run each cell separately. All the variables and functions all shared in across the cells.

      • Basic command:
        • Run current cell: ctrl+R
        • Run current cell and jump to the next cell: shift+R
        • Add cell above/below: select cell & type A/B
        • Remove cell: select cell & type x
      • You can write markdown in the notebook. This is useful when you are making coding documents.
        Nav bar > Cell > Cell Type > Markdown

Remote access:

Running our experiments on the powerful servers, which usually don’t have GUI, is a common case. Thus, we need to launch jupyter notebooks and access them remotely. There are two ways to do that:

      • Port Forwarding with SSH Tunnel:

First, launch jupyter notebook on the server side. Note that you can do it with session-based tool, e.g., session or tmux, so that the data would not vanish even if you closed the connection to the server.
(server w/ tmux)

Second, do port forwarding with ssh on the client side.

      • ssh -L port:<host>:<host_port> user@<SSH server address>

(client)

By doing so, we can access the jupyter notebook on the localhost of client side.

      • Port Forwarding with vscode entension

Install the Jupyter extension in vscode and just open your .ipynb

This extension will take over all the things.

    • Common packets
      • pandas
        • Pandas is a Python package that we often use to organize and analyze our data before trainning models, visualizing information.
        • You can follow the 10 minutes to pandas to take a quick look at it.
      • multiprocess
        • Python package. Leverage it on your large scale experiments.
        • Official document: link
      • subprocess
        • When you want to run shell command in you Python code, the recommended approach is subprocesses module.
        • Official API document: link
          (os.system() is an alternative way, however, subprocess module is more comprehensive.)
      • pathlib
        • pathlib is an object-oriented filesystem paths module, which is simpler than os moudle when you try to manuiplate file path in you Pyhton code. It encapsulates some of modules from os

For example, to get the parent directory, in os moudle, you need:

          • os.path.dirname(os.path.dirname(os.getcwd()))

However, in pathlib, you only need:

          • pathlib.Path.cwd().parent

      • argparser
        • When you want to pass parameters to your program, you can use argparser, which is the officially recommended module. It helps you define the type, optional or not, the default value, and the description of the arguments( –help ).
        • Official tutorial: link
      • matplotlib.pyplot
        • Matplotlib is a Python plotting library. You can use matplotlib in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.
        • It is a good plot tool which allow quick visualize you experiment result in Python. However, please don’t use matplotlib to plot figure which will be used in your paper. Please Use matlab if possible.
  • Regular Expression

  • Git & Github

Git is a free and open-source distributed version control tool. No matter you project is small or large, version control is important to make your project more efficient. When you work with a team, you need to ensure that everyone would not ruin the project when he/she does any modification on project. Even if the error occur, unlucky, you can still recover the project by few command. That is the power of version control.

Even if you work alone, you also need it to backup your source code. Suppose your deadline is only one day left, and you are quite busy on coding at your machine without any backup and version control. With the pressure from Bear, you accidentally remove a crucial script (it calls Murphy’s Law, you know). At this moment, a command “git pull” would save you life without any effort.

Free Git & Github learning resource: link

  • HackMD

We often use Hackmd to write the document, tutorial, project issue, and solution note. Just like what you see now.
Here is a nice HackMD tutorial Book

  • Comment your code

Always place some human readable descriptions when you are coding. Because you never know when will you or your labmates need to require them in the future (three months or one year, probably). Here is a nice article that teaches you how to comment Python in a good manner.

  • Modularization

Modularization is a essential technique in software design. Porper modularization keeps your code maintainable, readable, reliable. Modular programming is a software design technique to split your code into separate parts. These parts are called modules. Note that the dependency between modules should be minimize so that they can work independent without lots of noisy binding.
A good modulized system build by several modules with more and less dependency. The executable application will be created by putting them together.

Linux

  • SSH

Simple Connection Guide
Comprehensive Guideline

  • Basic commands

Linux Commands 101
File Permission and Ownership
Inpu/Output Redirection

    • | (pipe)
    • grep
    • sort
    • find
    • awk
    • sed
    • locate
    • ld
    • which
    • tee
    • readlink
    • ln
  • GPU driver & CUDA

Follow the document: https://hackmd.io/4t7SbwIcSwukksWngapc8A?view
If you are using PC, the GPU is often plugged in PCI-E. Oh, you need to shutdown your PC first, right? Or BBQ will be held on your motherboard.

  • Virtual env (docker, conda, venv, multi-pass)

Docker
conda (both python package and parts of apt package)
venv (python package only)
multipass (virtual Ubuntu instance)

  • Server Adminstration

Fresh Ubuntu Server Installation

Docker User Namespace Remap
Please read this article very carefully and understand everything in it.
TLDR: We use userns-remap to avoid users (not a sudoer) to get the root permission on the host with commands executed inside the docker container.

sodu command Audit (Installed on “teddy” only)
We record all the commands executed begin with sudo.
You can replay them via this scripts.

Firewall (ufw)

fail2ban

Disk Management

    • df
    • lsblk
    • parted
    • mkfs
    • blkid
    • mount

NMSL Logo on teddy

    • lolcat
    • motd
    • put this file under /etc/update-motd.d/ (You can change the logo to whatever you want)
    • ASCII Generator
  • x-window

    • Windows
      • Putty (Setup the x-client for you)

      • VcXsrv

put these line in your server .bashrc or .zprofile

      • # YourIP:0.0 for default setting in VcXsrv# ‘0.0’ can be found in Windows system trayexport DISPLAY=”140.114.xx.xx:0.0″

replace with your public IP

Remember to check the third option (Disable access control)

  • bash, zsh

    • In case you forget what is shell, this is for you.
    • If you are familiar with bash (the default shell in Ubuntu) and wanna try something fancy to make your everyday life better, give zsh a try.
    • Personally, I recommend zplug as your first zsh framewrok.
    • Here is my own zsh setting if anyone want to get a quick start.
    • [Rule of thumb] Choose the plugin you need and make your own zsh setting
  • screen, tmux

    screen Keep your command running in the background and control it whenever you want. Critical tool for your experiments that take long time to finish.
    tmux (link1link2)Give you additional terminal within a single tab to make your life easier.

  • Common issues on compiling

    • $PATH, $LD_LIBRARY_PATH is not correct
    • version mismatch on either linking library, compiler, driver, etc.
    • Files not found: make sure everything is in the right place.
  • Debugging tool (gdb)

Good tutorial

Miscellaneous

  • Rocket.chat

We use rocket.chat for communications

 

      • Create a new account
      • Also download the mobile app from the google app store

  • Google Groups

Professor will add you in our google group, so you can send and receive mails and see all the group events

To send the mail to everyone, the address is nmsl-all@googlegroup.com

  • Google Calendar

Professor will add you in our google calendar, so you can check our lab events and book meeting time

Click the office hour event on google calendar will bring you to the reservation page

  • 1-1 Meeting Reservation

Step 1. Open your google calendar and click NMSL Office Hour (Graduate Students)

Step 2. Click the appointment page

Step 3. Choose a time slot you want

Step 4. Click save

 

  • Weekly reports

You need to send weekly report to professor through email every Friday, the email name would be like: [WR] Your Name

Here is the content format:

Examples of not-very-useful todo tasks:
– Finish MM homework #2 <— has nothing or very little to do with your research
– Write my paper <— take longer than 4 hours to complete, break it down
– Read some papers <— tell me which papers you are reading
– Read more papers <— define more

Here are the more meaningful and get-to-the-points todo tasks:
Todo:
1. apply different numbers of features/factors
2. add features as factors for IS
3. add references to related work
4. sec. 3.1 and results writeup

  • Presentation slides

For those who are going to present in the next week, you need to share with us your presentation topic in the last group meeting

After presentation, please go to the second link and upload your presentation slides
Group meeting presentation title and order:
https://docs.google.com/spreadsheets/d/1Nk8V5x8DWbpaQhKh91I0kbwN0Q7bTGSQV7qka4aVqMQ/edit?usp=sharing

Group meeting presentation slides:
https://drive.google.com/drive/folders/1UejVCYAl_RIfriNZmQMx0F2hN1yuvfnw?usp=sharing
The file name of the slides would be like NAME_YYDD, for example, TC_210623