Hbase tips and tricks

Hbase tips and tricks

Hbase tips and tricks

  1. irbrc file-irbrc configuration to save all command history of all hbase shell invocations.
  2. minimal configuration of irbrc-

    more ~/.irbrc
    require 'irb/ext/save-history'
    IRB.conf[:SAVE_HISTORY] = 100
    IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb_history"
    Kernel.at_exit do
        IRB.conf[:AT_EXIT].each do |i|
  3. enable debugging level
  4. hbase>debug
    ./bin/hbase shell -d
  5. counters with hbase- hbase offers counter feature, counters are very useful in statistics
  6. hbase(main):001:0> create 'account', 'id'
    0 row(s) in 1.1930 seconds
    hbase(main):002:0> incr 'account', '2014', 'id:n', 1
    hbase(main):04:0> get_counter 'account', '2014', 'id:n'
  7. scan query optimization
  8. Scan is used to get the data from hbase and the costliest operation.
    An optional startRow and stopRow is useful to improve the query performance.If rows are not defined(start and stop), the Scanner will iterate over all rows.
    Hbase scan queries with start and end key are much faster because, it doesn’t have to scan everything to get the specified query/filter data.
    Here is tricks-

    • create hbase table and populate data-
    • create 'TS','cf'
      row id cf:desc
      card_number_year_month_day_time_o transaction_amt location type year month
      100_2014_06_10_10_932845_ta 100 bangalore credit 2014 6
      23989_2000_01_11_10_5468756_ta 45843745 bangalore india debit 2000 5

      2000 1
    • Avoid Full Table Scan-
    • find out all transaction done by card number x at place bangalore.
      use prefix/rowkey filter with regex/substring comparator to set the search condition and set the start row as ‘X’ and stop row ‘X~’.
      Row keys are sorted(lexical) and data is stored in byte in hbase. The start/stop key helps to avoid the complete table scan and fetch the data from region contains the range value, as(~) is last in ascii table so hbase scan lookup the rows having prefix X~.
      Retrieving data from HBase scan with filter-

      Scan scan = new Scan(Bytes.ToBytes("23989"),Bytes.toBytes("23989~");
    • Disable cache at client-
    • setCacheBlocks(false)
      and setCaching(0)

  9. Get all the row having account number 23989
  10. 1
    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.RowFilter
    import org.apache.hadoop.hbase.filter.SubstringComparator
    scan 'TS', {STARTROW=>'23989', STOPROW=>'23989~',FILTER=>RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('23989'))}

    Use start and stop row to optimize scan query.

  11. Count all row
  12. 1
    count 'TS',  INTERVAL => 10000, CACHE => 1000

    decrease CACHE value if row is very large.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.