Showing posts with label ubuntu. Show all posts
Showing posts with label ubuntu. Show all posts

Friday, December 19, 2014

Basic Hadoop Operations


Create a directory in hadoop:


  • hadoop fs -mkdir hdfs://localhost:9000/user/hadoop/mytestdir2
  • hadoop fs -mkdir /user/hadoop/mytestdir3


LISTING:


  • hadoop fs -ls  hdfs://localhost:9000/user/hadoop/
  • hadoop fs -ls /user/hadoop/



Create a local file and upload to hadoop directory: touch helloworld.txt or nano helloworld.txt

  • hadoop fs -put helloworld.txt /shiraz/hadoop/ 
  •  hadoop fs -cat /shiraz/hadoop/helloworld.txt


compiling:
  •  hadoop com.sun.tools.javac.Main WordCount.java
you may need to export the HADOOP_CLASSPATH variable:
export HADOOP_CLASSPATH= /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar


running a hadoop job:
  • hadoop jar wc.jar WordCount /shiraz/hadoop/input /shiraz/hadoop/output/paracount

hadoop executable location:
  • /usr/local/hadoop/bin/hadoop

conf/*-site.xml
  • /usr/local/hadoop/etc
mapred-site.xml
yarn-site.xml
core-site.xml
hdfs-site.xml

hdfs location:
  • /usr/local/hadoop_store/hdfs

start-all.sh, start-dfs.sh,  start-yarn.sh
  • /usr/local/hadoop/sbin

Thursday, December 18, 2014

Install Hadoop on Ubuntu


Set Date Time in Ubuntu

sudo date new_date_time_string

where new_date_time_string has to follow the format MMDDhhmmyyyy.ss which is described below:

MM is a two digit month, between 01 to 12
DD is a two digit day, between 01 and 31, with the regular rules for days according to month and year applying
hh is two digit hour, using the 24-hour period so it is between 00 and 23
mm is two digit minute, between 00 and 59
yyyy is the year; it can be two digit or four digit
ss is two digit seconds. Notice the period . before the ss.

So, in your particular case, you can use:

sudo date 010224311971.59

Enable Root Login For ssh - Ubuntu 14.04

Add/comment Following Lines as shown below in /etc/ssh/sshd_config:


#PermitRootLogin without-password
PermitRootLogin yes
#StrictModes yes

# Change to no to disable tunnelled clear text passwords
PasswordAuthentication yes

UsePAM no

Restart ssh service:
sudo service ssh restart

If ssh is not installed, do it as below: 
sudo apt-get install openssh-server

SSH is not enabled by default in Ubuntu, but you can easily enable this service via OpenSSH, a free version of the SSH connectivity tools developed by the OpenBSD Project.

Another method to restart ssh:
sudo /etc/init.d/ssh restart

What is the difference between ssh_config and sshd_config?

ssh_config: 
Configuration file for the ssh client on the host machine you are running. For example, if you want to ssh to another remote host machine, you use a SSH client. Every settings for this SSH client will be using ssh_config, such as port number, protocol version and encryption/MAC algorithms.

sshd_config: 
Configuration file for the sshd daemon (the program that listens to any incoming connection request to the ssh port) on the host machine. That is to say, if someone wants to connect to your host machine via SSH, their SSH client settings must match your sshd_config settings in order to communicate with you, such as port number, version and so on.

The sshd_config is the ssh daemon (or ssh server process) configuration file. 
Whereas, the ssh_config file is the ssh client configuration file. The client configuration file only has bearing on when you use the ssh command to connect to another ssh host. 

example:

For the SSH port number, ssh_config is manually set to be 1000 (decided by the remote host), sshd_config is set to be 5555. If anyone wants to connect to your host, they MUST set their ssh client port always to 1000. However, you will always use port 5555 as the default port (instead of the 22) to connect to a remote machine. If the remote machine uses another port or the standard port, you need to specify the port number in the command line, e.g., “ssh -p 22 remote.host.ip”