Blog

Learning the bash Shell – (Reading Notes)

This are some summary notes while reading the book. Written by Cameron Newbam & Bill Rosenblantt.

0. Summary of bash features:

(features common with C shell):
– Directory manipulation (pushd, popd, dirs)
– Job control (fg, bg), also stop jobs with CTRL-Z
– Brace expansion (generating arbitrary strings)
– Tilde expansion, shorthand way to refer to directories
– Aliases, used to define shorthand names for commands/command lines
– Command history
(bash’s major new features):
– command-line editing (vi-, emacs-style)
– key bindings
– integrated programming features (UNIX with Shell)
– control structures (select construct, easy menu generation)
– new options and variables (env customization)
– 1D arrays that allow easy referencing and manipulation of lists of data
– dynamic loading of built-ins (also can write your own and load them into running shell)

Chapter 1

  • a shell is any user interface to the UNIX operating system
  • There are various types of user interface. Bash belongs to the most common category, know as character-based user interface. (other example: GUI)

Interactive Shell Use:
open up terminal, type ‘bash‘ to enter shell programming mode. To exit, type ‘exit‘ or ‘logout‘ or press CTRL-D.

Shell command line consist of one or more words. The first word is ‘command’ and the following are ‘arguments (parameters)’.
e.g. lp myfile
lp: print a file
myfile: the name of a file to print

An ‘option’ is a special type of argument that gives the command specific information on what it is supposed to do. (usually starts with a dash)

e.g. lp -h myfile
-h myfile: tells lp not to print the ‘banner page’ before it prints the file

e.g.2 lp -d lp1 -h myfile
-d lp1: send the output to the printer called lp1
-h myfile: same as above

Files

  • Regular Files
  • Executable Files
  • Directories

fastText Notes

fastText is handy and efficient in document classification and word vectors representation.

The major difference between fasttext and gensim word2vec is:
FastText takes consideration of sub word vector/representation. So even when there is new words or wrongly spelled words (out of vocabulary), it is able to give a most reasonable word vector based on training data.

e.g. ‘words’ never appeared in training text, but ‘word’ did appear. FastText model can still relate words to be close to word.

Common Terminal Command used for text classification

Terminal directory to be in fastText folder:

Supervised text classification model training:
./fasttext supervised -input text.train -output model_text -lr 1.0 -epoch 25 -wordNgrams 2 -bucket 200000 -dim 50 -loss hs

bucket: Word and character ngram features are hashed into a fixed number of buckets, in order to limit the memory usage of the model. The option -bucket is used to fix the number of buckets used by the model. The larger the bucket number, the larger will be final model size;

After training, to obtain the k most likely labels and their associated probabilities for a new input text document:

$ ./fasttext predict-prob model.bin test.txt k

Handy Codes

This is a page recording my frequently used codes.
Which I may never remembered… 🙂

File management with Jupyter Notebook

# upload a zip file onto jupyter 
import zipfile
zip_ref = zipfile.ZipFile('the_file.zip', 'r')
zip_ref.extractall()
zip_ref.close()

# download folder from jupyter
!tar chvfz notebook.tar.gz *

Retina Matplotlib setting

%config InlineBackend.figure_format = ‘retina'

Flatten a list of lists (This one I managed to remember after…quite a while)

flat_list = [item for sublist in l for item in sublist]

In time reload imported packages
(when you want to test your .py codes in jupyter notebook, with this code you don’t need to restart kernel to reload updated .py code)

%load_ext autoreload
%autoreload 2

Frequently Used MySQL Code

# Add a column with unique values
ALTER TABLE table_group.table_name ADD UNIQUE (url);

# Add a column with integer value and not null attributes
ALTER TABLE table_group.table_name ADD COLUMN colname bigint unsigned not null;

SET SQL_SAFE_UPDATES = 0;
SET @@global.sql_mode= 'NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION’;

UPDATE table_group.table_name B
JOIN (
   SELECT id, CRC32(url) url_crc FROM table_group.table_name
 ) as A ON A.id = B.id
 SET B.url_crc = A.url_crc;

# Remove all content in a table
Truncate table_group.table_name;

# delete a column
ALTER TABLE table_group.table_name DROP COLUMN colname;

# create a new table with existing table format
CREATE TABLE new_table_name LIKE existing_table_name;

# Call Stored Procedure as an event
CREATE EVENT `event`
ON SCHEDULE EVERY 1 DAY
DO CALL procedure;

# Stored Procedures
# Combine the two tables using 'IGNORE', which will drop duplicates automatically. 
CREATE DEFINER=`root`@`%` PROCEDURE `procedure_name`()
BEGIN
INSERT IGNORE INTO table_group.table_name1 (col1, col2...)
SELECT col1, col2... FROM table_group.table_name2;
TRUNCATE table_group.table_name2;
END


Connect MySQL with Jupyter
Comment:
1. when using ‘INSERT INTO’ command, conn.commit() is important or the content will not be inserted into the table.
2. with mariandb format, use ` instead of ‘ !

import pymysql.cursors

conn = pymysql.connect(host='xxx',
user='root',
password='xxx',
db='table_group',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)

cursor = conn.cursor()
cursor.execute("""create table `pytest` (`id` int(11), `name` varchar(20));"”")
cursor.execute("""INSERT INTO table_name
` (`url`, `jd`) VALUES (%s, %s)""", ('test', 'test’))
conn.commit()
cursor.close()

Exploring GIS Fundamentals

Recently finished this fundamental course on Geographic Information System(GIS) on Coursera, conducted by UC Davis.

As I know UC Davis is very strong on geotechnical engineering as we all know when in UCLA. And it’s nice they have this course shared on coursera. During winter quarter 2016, UCLA geotech master program also introduced a course on GIS for the first time. Though I really wanted to take it, it clashed with two of my main courses so I missed it. In class, most students are from hydrology, only one or two from structure and geotech.

As later my interests turned to data analysis and programming, I find GIS very powerful comes to data visualization and manipulation. I believe there are a lot more if I finish this specialization.

White Point Landslide Analysis

This is a landslide project done during my master course “Earth Retaining Structure and Slope Stability”. It is based on a real case happened in 2011 near White Point Beach, Los Angeles, USA.

Original report are done by Shannon&Willson.Inc, which we referred to as standard to verify our model in Slide.

Analysis are performed using Slide 7.0 by Rocscience, and also some excel calculations.

White Point Landslide Analysis Final Report