Questions about this topic? Sign up to ask in the talk tab.

Ruby

From NetSec
(Redirected from Ruby gem nokogiri)
Jump to: navigation, search

Ruby is an interpreted language, dynamically, reflective, semi-Functional and Object Orientated scripting language written in C. Ruby is said to be semi-Functional because it supports higher-order functions (aka lambdas) and closures (aka blocks). Ruby was created by Yukihiro "Matz" Matsumoto and was first released in 1995.

Matz's goal was to combine powerful features from various other programming languages, to create a programming language maximized for developer happiness; as opposed to computational efficiency. Ruby's Object Model mirrors that of Smalltalk, the syntax shares some similarities with Bash, Perl, Python, and the scoping rules for closures were taken from LISP.

Special thanks to Postmodern and Ohdazed for their contributions to this article.

Basics

Development Environment

  • Installation
  • Gems

Interactive Ruby Console

IRB, or the Interactive Ruby Console, comes bundled with Ruby and allows you to interactively run code right from your command line. IRB can be invoked from your terminal simply by typing 'irb'.

» irb --help Usage: irb.rb [options] [programfile] [arguments]

 -f		    Suppress read of ~/.irbrc 
 -m		    Bc mode (load mathn, fraction or matrix are available)
 -d                Set $DEBUG to true (same as `ruby -d')
 -r load-module    Same as `ruby -r'
 -I path           Specify $LOAD_PATH directory
 --inspect	    Use `inspect' for output (default except for bc mode)
 --noinspect	    Don't use inspect for output
 --readline	    Use Readline extension module
 --noreadline	    Don't use Readline extension module
 --prompt prompt-mode
 --prompt-mode prompt-mode

Switch prompt mode. Pre-defined prompt modes are `default', `simple', `xmp' and `inf-ruby'

 --inf-ruby-mode   Use prompt appropriate for inf-ruby-mode on emacs. 

Suppresses --readline.

 --simple-prompt   Simple prompt mode
 --noprompt	    No prompt mode
 --tracer	    Display trace for each execution of commands.
 --back-trace-limit n

Display backtrace top n and tail n. The default value is 16.

 --irb_debug n	    Set internal debug level to n (not for popular use)
 -v, --version	    Print the version of irb
$ irb
irb(main):001:0>puts 'woot'
woot
=> nil
irb(main):002:0>

Running irb --simple-prompt will provide you with a more basic looking Ruby shell.

$ irb --simple-prompt
>> puts 'woot'
woot
=> nil

You can use either 'exit', 'quit', or 'irb_exit' to close IRB and return to your terminal.

$ irb --simple-prompt
>> exit
$ # back in our terminal :D

RVM

RVM is a handy utility that can be used to manage multiple Ruby version installations on the same operating system. It allows you to run multiple interpreters and gemsets with a simple way to switch between them.

https://rvm.io/

Pry

Pry is a Ruby gem that provides an alterative to IRB (Interactive Ruby Console) with additional functionality to dig down into Ruby code while you are coding. You can view the source-code of any Ruby method from within the interpreter. It also provides some cool extras like Gist integration, syntax highlighting, and command shell integration.

Pry Homepage: https://github.com/pry/pry/

Your first program

Code

#!/usr/bin/ruby
puts "Hello world\n"

Style

While Ruby is not whitespace sensitive, all Rubyists share a certain style:

  1. Indent code using two spaces.
  2. All code must be within 80 columns.
  3. Class/Module names must be capitialized and CamelCased.
  4. Constants must be UPPERCASED_AND_UNDERSCORED.
  5. Method/variable names must be in snake_case.

For a complete Ruby Style Guide, please see the GitHub Ruby Style Guide.

Explanation

Variables

Local

A local variable is a variable that can only be used within the block it was initialized in. It can be created by assigning an Object to a lowercase keyword.

 
foo = 'bar'
 

Global

Global variables can be accessed from anywhere within the entire program. They can be created by prefixing your variable the the $ symbol. Editting the assignment of a global variable will change the status of that variable globally and is generally avoided when writing Ruby scripts; use class-variables or constants instead.

 
$woot = 1337
 

Instance Variables

Instance variables begin with the '@' symbol. Accessing an uninitialzed instance will return the nil value.

 
>> @instance
=> nil
>> @instance = 'ohdae'
=> "ohdae"
 

Instance variables can also be defined at the Class/Module level:

 
>> module Settings
>>   @config = {:verbose => false}
>>   def self.config; @config; end
>> end
=> Settings
>> Settings.config
=> {:verbose => false}
 

Class Variables

Class variables are shared by all methods within a Class or Module. These are created by using two '@' symbols at the beginning of your variable. Trying to initialize a class variable outside of a class will throw an error.

 
>> @@classvar
NameError: uninitialized class variable @@classvar in Object
	from (irb):5
	from :0
>> class Blackhat
>>   @@classvar = 'ohhai'
>>   def self.classvar; @@classvar; end
>> end
=> Blackhat
>> Blackhat.classvar
=> "ohhai"
 

Predefined Variables

Certain variables are pre-defined into Ruby. The values of these variables cannot be changed.

  • self: Represents the current scope. Similar to this from Java.
  • nil: The null object.
  • true: Boolean true.
  • false: Boolean false.

Data Types

Boolean Values

Ruby defines two boolean values, true and false:

 
>> 1 == 1
=> true
>> 1 == 2
=> false
 

Integers

Ruby also provides Integer literals:

 
>> 10
=> 10
>> 010 # octal
=> 8
>> 0x0a # hexadecimal
=> 10
>> 1_000_000 # short-hand for big values
=> 1000000
 

In Ruby, Integers are also Objects, which you can call methods on:

 
>> 0x42.chr
=> "A"
 

Floats

Ruby also provides floating point decimal literals:

 
>> 1.5
=> 1.5
>> 1.5e10
=> 15000000000.0
 

Floats are also Objects:

 
>> 1.5.round
=> 2
 

Strings

Ruby supports String literals as well.

 
>> "hello world"
=> "hello world"
>> 'hello\nworld'
=> "hello\\nworld"
>> "2 + 2 = #{2 + 2}" # String embedding
=> "2 + 2 = 4"
>> %{one
two
three}
=> "one\ntwo\nthree"
 

Strings are also Objects in Ruby:

 
>> "hello".reverse
=> "olleh"
 

Symbols

Symbols are like Strings, but used for keywords or identifiers:

 
>> :verbose
=> :verbose
>> :verbose == 'verbose'
=> false
>> :verbose == :verbose
=> true
 

Regular Expressions

Ruby supports inline Regular Expressions, or Regexps. Regular Expressions can be used to match Strings:

 
>> "1-555-3333" =~ /(\d-)?\d{3}-\d{4}/
=> 0
>> match = "Please call 1-555-3333".match(/(\d-)?\d{3}-\d{4}/)
=> #<MatchData "1-555-3333" 1:"1-">
>> match[0]
=> "1-555-3333"
 

Regexps can also be used to rewrite Strings:

 
>> "hello world".gsub(/e/,'3').gsub(/o/,'0')
=> "h3ll0 w0rld"
 

Arrays

An Array is a group of objects, very similar to lists in Python. The items or objects inside an array are indexed on a non-negative zero-index. The objects inside of a Ruby array can be any mixture of variables. Creating an array can be done in a few different ways. You do not need to specifically declare your variable as an array during initialization, for example if you give Ruby a list of comma separated values inside brackets, Ruby will recgonize this as an array and use it as such from that point forward.

 
>> my_array = []
=> []
>> array_three = ["item", 5, foo, "Item2", "Strings can go here too"]
=> ["item", 5, "bar", "Item2", "Strings can go here too"]
>> array_three[0]
=> "item"
>> array_three[1]
=> 5
>> array_with_default = Array.new(1)
=> []
>> array_with_default[0]
=> 1
>> filled_array = Array.new(5) { 0 }
=> [0, 0, 0, 0, 0]
 

You will notice the item 'foo' is printed as "bar" because we defined this earlier in this page.

Hashes or Associative Arrays

Ruby Hashes are similar to Arrays, except they require two objects for every item in the group. One of these objects is the 'key' and the other is the 'value'. This is very useful when you want to index items by some other than the array's zero-index. A hash also uses braces, instead of the brackets an array uses.

 
new_hash = {
  :username  => 'admin',
  :password  => 'letmein123',
  :hostname  => 'blackhatacademy.org',
  :service   => 'ssh'
  :port      =>  22
}
 

You can use the key inside of brackets to return the value for that object.

 
>> new_hash[:username]
=> 'admin'
 

Iterate through a hash, returning each key matched with it's value

 
>> new_hash.each do |k, v|
>>   puts "key: #{k}"
>>   puts "value: #{v}"
>> end
key: port
value: 22
key: hostname
value: blackhatacademy.org
key: service
value: ssh
key: username
value: admin
key: password
value: letmein123
=> {:port=>22, :hostname=>"blackhatacademy.org", :service=>"ssh", :username=>"admin", :password=>"letmein123"}
 

Hashes can also have default values:

 
>> counter = Hash.new(0)
=> {}
>> counter[:foo] += 1
=> 1
>> counter[:bar] += 2
=> 2
>> counter
=> {:foo=>1, :bar=>2}
 
>> require 'digest/md5'
=> true
>> md5s = Hash.new { |hash,key| hash[key] = Digest::MD5.hexdigest(key) }
=> {}
>> md5s['foo']
=> "acbd18db4cc2f85cedef654fccc4a4d8"
>> md5s['bar']
=> "37b51d194a7513e45b56f6524f2d51f2"
>> md5s
=> {"foo"=>"acbd18db4cc2f85cedef654fccc4a4d8", "bar"=>"37b51d194a7513e45b56f6524f2d51f2"}
 

Casting

Ruby is a dynamically typed language, so it doesn't have Static Types or Type-Casting. Instead it has methods which convert an Object into another format:

  • to_i: Converts an Object to an Integer.
  • to_f: Converts an Object to a Float.
  • to_s: Converts an Object to a String.
  • to_sym: Converts an Object to a Symbol.
  • to_a: Converts an Object to an Array.
  • to_set: Converts an Object to a Set.

Ruby also provides top-level methods for coercing an Object into another type:

 
>> Integer(1.5)
=> 1
>> Float(1)
=> 1.0
>> String(1.5)
=> "1.5"
>> Array(1)
=> [1]
>> Array(nil)
=> []
 

Operators

Boolean

Ruby supports all the usual ANSI C boolean operators:

 
>> true && false # AND
=> false
>> true || false # OR
=> true
>> true ^ true # XOR
=> false
>> !true # NOT
=> false
 

The &&, ||, ! operators can also be used with non-Boolean values. In Ruby, everything is treated as true, except for nil. As a result of this, you can use the ||= operator to lazily initialize variables:

 
module Settings
 
  def self.path
    @path ||= File.join(ENV['HOME'],'.foorc')
  end
 
end
 

Ternary Operator

Ruby supports the ternary operator, just like ANSI C:

 
number >= 0 ? :integer : :negative
 

Bitwise Manipulations

Main article: Bitwise math

Ruby supports all of the common ANSI C bit-wise operators:

 
>> 0xff & 0x02 # AND
=> 2
>> 0x02 | 0x1  # OR
=> 3
>> 0x02 ^ 0x3  # XOR
=> 1
>> ~0x0        # inverse
=> -1
>> 0x1 << 2    # left-shift
=> 4
>> 0x3 >> 1    # right-shift
=> 1
 

Statements

If

Like other scripting languages, Ruby supports if statements:

 
if x > 10
  puts 'greater than 10'
elsif x == 10
  puts 'equal to 10'
else
  puts 'less than 10'
end
 

if statements can also be written as one-liners:

 
puts 'x is a negative number' if x < 0
 

Ruby also provides an unless statement, which is equivalent to if !expression.

Case

Begin

Rescue

Ensure

Loops

For

Ruby supports for loops like any other language:

 
>> for i in (1..10)
>>   puts i
>> end
2
3
4
5
6
7
8
9
10
 

Generally, you never need to use for loops, due to Ruby's excellent support for Enumerators.

While

Ruby also supports traditional while loops:

 
>> queue = [1,2,3,4]
=> [1, 2, 3, 4]
>> while (i = queue.shift)
>>   puts i
>> end
1
2
3
4
 

Until

The until loop is a short-hand for saying while !expression:

 
>> queue = [1,2,3,4]
=> [1, 2, 3, 4]
>> until queue.empty?
>>   puts queue.shift
>> end
1
2
3
4
 

Enumerators

Ruby has superb support for Enumerators, or Iterators.

 
>> (1..5).each { |i| puts i }
1
2
3
4
5
=> 1..5
>> (1..5).first(2)                   # get the first two integers
=> [1, 2]
>> (1..5).any? { |i| i % 2 == 0 }    # are any even integers?
=> true
>> (1..5).all? { |i| i < 10 }        # are all integers less than 10?
=> true
>> (1..5).select { |i| i % 2 == 0 }  # select only even integers
=> [2, 4]
>> (1..5).map { |i| i * 2 }          # multiply every integer by two
=> [2, 4, 6, 8, 10]
 

These methods are defined by including the Enumerable module, and defining an each which enumerates over the elements of the Object:

 
>> class PasswordBruter
>>   include Enumerable
>>   def initialize(start_word)
>>     @start_word = start_word
>>     @end_word   = ('x' * start_word.length)
>>   end
>>   def each
>>     (@start_word..@end_word).each { |word| yield word }
>>   end
>> end
=> PasswordBruter
>> PasswordBruter.new("hello").each { |word| puts word }
hello
hellp
hellq
hellr
hells
hellt
hellu
hellv
hellw
hellx
....
 

Global Methods

defined?

Checks if a Class, Module or Constant is defined:

 
if defined?(RUBY_ENGINE)
  puts "Running on #{RUBY_ENGINE}"
end
 

send

Allows you to call arbitrary methods:

 
obj.send(method_name)
 

throw / catch

throw / catch: Throw and Catch allows one to jump out of a scope. This is similar to long jmp in Assembly:

 
def callback
  catch(:abort, :success) do
    puts 'yielding ...'
    yield
    puts 'completed!'
  end
end
 
>> callback { puts 'in the callback' }
yielding ...
in the callback
completed!
=> :success
>> callback { puts 'aborting!'; throw :abort, :failed }
=> :failed

raise

Raises an exception:

 
>> raise("error!")
RuntimeError: error!
	from (irb):38
>> raise(ArgumentError,"invalid argument")
ArgumentError: invalid argument
	from (irb):39
 

User Input

STDIN / STDOUT / STDERR

Ruby provides the STDIN, STDOUT and STDERR IO constants. You can use these constants to read-input or print output.

 
STDOUT.print "Password: "
 
password = STDIN.readline
 
if password.length < 6
  STDERR.puts "Password was too short!"
  exit -1
else
  puts "You entered: #{password}"
end
 

As you've probably guess, top-level IO methods, such as puts and readline, simply call to STDOUT or STDIN.

Ruby also provides the $stdin, $stdout, $stderr global variables, which allows you to hook/redirect all input/output. It's generally a good idea to use these global variables, instead of the constants.

Command-line Arguments

All command-line arguments are exposed via the ARGV constant.

 
#!/usr/bin/env ruby
puts "You passed the following arguments:"
ARGV.each { |arg| puts "  #{arg}" }
 

Ruby also provides an ARGF which is the File for the first file-argument:

 
#!/usr/bin/env ruby
 
unless ARGV[0]
  $stderr.puts "usage: #{$0} FILE"
  exit -1
end
 
ARGF.each_line do |line|
  # ...
end
 

If you need to parse command-line options, use the built-in OptionParser library.

Environment Variables

You can access and set Environment Variables via the ENV Hash:

 
>> ENV['HOME']
=> "/home/blackhatacademy"
 

User-defined

Methods

A simple hello world method:

 
 
def hello_world
  puts "Hello, World!"
end
 

When calling a method that takes an argument, the parenthesis around the argument are optional.

 
def hello(name)
  puts "Hello, #{name}!"
end
 
>> hello('ohdae')
Hello, ohdae!
=> nil
>> hello 'ohdae'
Hello, ohdae!
=> nil
 

Methods with default arguments:

 
def hello(name='ohdae')
  puts "Hello, #{name}!"
end
 
>> hello
Hello, ohdae!
=> nil
>> hello 'world'
Hello, world!
=> nil
 


Prematurely returning from a method:

 
def add(arg1, arg2) 
  return nil unless arg1
  return nil unless arg2
 
  arg1.to_i + arg2.to_i
end

Note, that we don't have to specify a return on the last line of our add method. This is because in Ruby, every statement has a return value, and the last-statement of a method is it's default return value.

Classes

Classes have names which must start in capital letters.

 
class MyClass
 
end
 

Classes can also inherit from other Classes:

 
class MyOtherClass < MyClass
 
end
 

In Ruby, the constructor method is called initialize:

 
class MyClass
  def initialize
    puts "Hello\n"
  end
end
 

Reader/writer methods for variables can also be defined within the class:

 
class MyClass
 
  # defines a foo method
  attr_reader :foo
 
  # defines a foo= method
  attr_writer :bar
 
  # defines foo and foo= methods
  attr_accessor :baz
 
  def initialize(foo,bar,baz)
    @foo, @bar, @baz = foo, bar, baz
  end
end
 

Modules

Modules are a way to group methods together, which can be included into multiple Classes. Modules allow you to compartmentalize common behaviours.

 
module HasName
  attr_accessor :name
 
  def initialize(name)
    @name = name
  end
 
  def named?(other_name)
    @name == other_name
  end
 
  def to_s
    @name.to_s
  end
 
end
 
class Person
 
  include HasName
 
end
 
class Computer
 
  include HasName
 
  def initialize(ip,name)
    @ip = ip
 
    super(name)
  end
 
end
 

Helpful Libraries

Ruby's stdlib provides many useful libraries.

Struct

Struct allows one to easily create Classes with specific attributes:

 
class Point < Struct.new(:x, :y)
end
 
>> p1 = Point.new(10, 20)
=> #<struct Point x=10, y=20>
>> p1.x
=> 10
>> p1.x += 20
=> 30

OpenStruct

OpenStruct is like Struct, but allows for arbitrary attributes:

 
>> require 'ostruct'
=> true
>> config = OpenStruct.new
=> #<OpenStruct>
>> config.x = 10
=> 10
>> config.name = 'foo'
=> "foo"
>> config.bla
=> nil
 

Set

Set is like an Array, but does not allow duplicates:

 
>> require 'set'
=> true
>> s = Set[]
=> #<Set: {}>
>> s << 1
>> s << 2
>> s << 2
>> s << 3
=> #<Set: {1, 2, 3}>
 

Base64

 
>> require 'base64'
=> true
>> Base64.encode64("hello")
=> "aGVsbG8=\n"
>> Base64.decode64("c2VjcmV0\n")
=> "secret"
 

Digest

The Digest module, provides classes for computing Cryptographic Hashes, such as MD5 and SHA256:

 
>> require 'digest/md5'
=> true
>> Digest::MD5.hexdigest("hello")
=> "5d41402abc4b2a76b9719d911017c592"
>> require 'digest/sha2'
=> true
>> Digest::SHA256.file('linux.iso')
=> "5a7eb8f97a196583530812132ef98fe9"
 

JSON

Ruby also provides a JSON serialization and parsing library:

 
>> require 'json'
=> true
>> JSON.parse("[1,2]")
=> [1, 2]
>> {1 => 2, 2 => 3}.to_json
=> "{\"1\":2,\"2\":3}"
 

Queue

Sockets

Simple TCP server:

 
require 'socket'
 
server = TCPServer.new 4444          # Binds to TCP port 4444
 
loop do                              # Starts a loop to wait for connection
  client = server.accept             # Receive and accept client connection
  client.puts "Hack The Planet!"     # Send data to client
  client.close
end
 

Simple TCP client:

 
require 'socket'
 
client = TCPSocket.new 'localhost', 4444      # Open TCP connection to localhost on port 4444
 
while line = client.gets                      # Read input from the socket connection
  puts line                                   # Prints the input we captured with s.gets
end                                      
 
client.close                                  # Closes the socket
 

URI

The URI library handles parsing and crafting of URIs:

 
>> require 'uri'
=> true
>> uri = URI("http://www.blackhatlibrary.net/index.php?title=Ruby&action=edit&section=42")
=> #<URI::HTTP:0x000000010e2080 URL:http://www.blackhatlibrary.net/index.php?title=Ruby&action=edit&section=42>
>> uri.host
=> "www.blackhatlibrary.net"
>> uri.port
=> 80
>> uri.query
=> "title=Ruby&action=edit&section=42"
 

Net::HTTP

open-uri

open-uri allows one to open URLs as temporary files:

 
>> require 'open-uri'
=> true
>> page = open(URI("http://www.blackhatlibrary.net/"))
=> #<File:/tmp/open-uri20120812-2361-143dlvc>
 

Gems

Gems are packaged Ruby libraries and scripts. Anyone can create their own gem and publish it to https://rubygems.org.

Gems are installed using the gem install:

$ gem install foo-bar

Once installed, you can require the gem:

require 'foo/bar'

Nokogiri

Nokogiri is a fast XML/HTML parser built ontop of libxml.

Installing on Debian / Ubuntu:

$ sudo apt-get install libxml2-dev libxslt1-dev
$ gem install nokogiri

Installing on RedHat / Fedora:

$ sudo yum install libxml2-devel libxslt-devel
$ gem install nokogiri
 
require 'nokogiri'
require 'open-uri'
 
doc = Nokogiri::HTML(open("http://www.reddit.com/"))
 
doc.search("div.entry a.title").each do |link|
  puts link.attributes['href']
end
 

Sequel

Sequel is a low-level SQL library for Ruby. It allows you to query tables and insert data, all without having to write raw SQL. Sequel supports SQLite3, MySQL, PostgreSQL, Oracle, etc.

Installing on Debian / Ubuntu:

$ sudo apt-get install libsqlite3-dev
$ gem install sqlite3-ruby sequel

Installing on RedHat / Fedora:

$ sudo yum install sqlite-devel
$ gem install sqlite3-ruby sequel
 
require 'sequel'
 
DB = Sequel.sqlite('file.sqlite3')
DB.create_table(:mytable) do
  integer: id
  varchar: name, :length => 256
end
 
DB[:mytable].insert(1, "foo")
DB[:mytable].insert(2, "bar")
DB[:mytable].where(:name => 'bar')
 

For more Sequel tips, please see the Sequel Cheat Sheet.

Ruby is part of a series on programming.
<center>
</center>