Weka - Attribute Selection Measure: Information Gain (ID3)

In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan[1] used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains. - Wikipedia

Data that will be test is :


We do attribute selection by using Information Gain as Attribute Evaluator. Here are the result : 
From the above result, we can see that Outlook is the best split, while Day does not give any contribute to output, so we can remove Day attribute to test with Id3.

For those who don't have Id3 in their weka, you can download in from package manager named "simpleEducationalLearningSchemes".

After you have downloaded, you can got to Classifier - > trees - > Id3  to test the data. Before you start, you can click on More Options then click Choose - >  PlainText


 After that you can start the process of Id3. Here are the results.

Id3 can't visualize tree, but we can draw the tree based on the result given above.

Here are trees. From the Information Gain, we get the best split is Outlook attribute, so Id3 use Outlook as the root node, then followed by temperature and windy. 

That's all. Thanks!

Weka - Information Gain and Gain Ratio Using Soybean Database

Notes: The large soybean database (soybean-large-data.arff) and it's corresponding test database (soybean-large-test.arff) combined into a single file (soybean-large.arff).

1. Title: Large Soybean Database

2. Sources:
      (a) R.S. Michalski and R.L. Chilausky "Learning by Being Told and
          Learning from Examples: An Experimental Comparison of the Two
      Methods of Knowledge Acquisition in the Context of Developing
      an Expert System for Soybean Disease Diagnosis", International
      Journal of Policy Analysis and Information Systems, Vol. 4,
      No. 2, 1980.
      (b) Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlimmercs.cmu.edu)
      (c) Date: 11 July 1988

3. Past Usage:
     1. See above.
     2. Tan, M., & Eshelman, L. (1988). Using weighted networks to represent
        classification knowledge in noisy domains.  Proceedings of the Fifth
        International Conference on Machine Learning (pp. 121-134). Ann Arbor,
         Michigan: Morgan Kaufmann.
         -- IWN recorded a 97.1 classification accuracy
            -- 290 training and 340 test instances
      3. Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and
         Predictive Accuracy. Proceedings of the Fifth
         International Conference on Machine Learning (pp. 22-28). Ann Arbor,
         Michigan: Morgan Kaufmann.
         -- Notes why this database is highly predictable

4. Relevant Information Paragraph:
     There are 19 classes, only the first 15 of which have been used in prior
     work.  The folklore seems to be that the last four classes are
     unjustified by the data since they have so few examples.
     There are 35 categorical attributes, some nominal and some ordered.  The
     value ``dna'' means does not apply.  The values for attributes are
     encoded numerically, with the first value encoded as ``0,'' the second as
     ``1,'' and so forth.  An unknown values is encoded as ``?''.

5. Number of Instances: 683

6. Number of Attributes: 35 (all have been nominalized)

7. Attribute Information:
    -- 19 Classes
     diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot,
     phytophthora-rot, brown-stem-rot, powdery-mildew,
     downy-mildew, brown-spot, bacterial-blight,
     bacterial-pustule, purple-seed-stain, anthracnose,
     phyllosticta-leaf-spot, alternarialeaf-spot,
     frog-eye-leaf-spot, diaporthe-pod-&-stem-blight,
     cyst-nematode, 2-4-d-injury, herbicide-injury.   

    1. date:        april,may,june,july,august,september,october,?.
    2. plant-stand:    normal,lt-normal,?.
    3. precip:        lt-norm,norm,gt-norm,?.
    4. temp:        lt-norm,norm,gt-norm,?.
    5. hail:        yes,no,?.
    6. crop-hist:    diff-lst-year,same-lst-yr,same-lst-two-yrs,
                        same-lst-sev-yrs,?.
    7. area-damaged:    scattered,low-areas,upper-areas,whole-field,?.
    8. severity:    minor,pot-severe,severe,?.
    9. seed-tmt:    none,fungicide,other,?.
   10. germination:    '90-100','80-89','lt-80',?.
   11. plant-growth:    norm,abnorm,?.
   12. leaves:        norm,abnorm.
   13. leafspots-halo:    absent,yellow-halos,no-yellow-halos,?.
   14. leafspots-marg:    w-s-marg,no-w-s-marg,dna,?.
   15. leafspot-size:    lt-1/8,gt-1/8,dna,?.
   16. leaf-shread:    absent,present,?.
   17. leaf-malf:    absent,present,?.
   18. leaf-mild:    absent,upper-surf,lower-surf,?.
   19. stem:        norm,abnorm,?.
   20. lodging:        yes,no,?.
   21. stem-cankers:    absent,below-soil,above-soil,above-sec-nde,?.
   22. canker-lesion:    dna,brown,dk-brown-blk,tan,?.
   23. fruiting-bodies:    absent,present,?.
   24. external decay:    absent,firm-and-dry,watery,?.
   25. mycelium:    absent,present,?.
   26. int-discolor:    none,brown,black,?.
   27. sclerotia:    absent,present,?.
   28. fruit-pods:    norm,diseased,few-present,dna,?.
   29. fruit spots:    absent,colored,brown-w/blk-specks,distort,dna,?.
   30. seed:        norm,abnorm,?.
   31. mold-growth:    absent,present,?.
   32. seed-discolor:    absent,present,?.
   33. seed-size:    norm,lt-norm,?.
   34. shriveling:    absent,present,?.
   35. roots:        norm,rotted,galls-cysts,?.

-------------------------------------------------------------------------------------------------
For this task, we want to search 5 and 10 best attributes using Information Gain and Gain Ratio, and do some analysis for both result.

1) Open weka (me using weka 3.7), click on Explorer



2) Load the data from datasets in weka directory
C:\Program Files\Weka-3-7\data\soybean.arff


3)  Click on Select Attributes tab



4) Click on Choose [Attribute Evaluator - > InfoGainAttributeEval], [Search Method -> Ranker]


5) Back to our task, where we need to find 5 and 10 best attributes, so first I will show how to get the best 5 attributes, then we just need to repeat step 5 to do for 10 best attributes.

Click on Ranker, type 5 in numToSelect. Then click okay.


6) For Attribute Selection Mode, by default it will tick Use for training set, so don't change anything.
Click Start. Then it will display the result at the right hand side.


From the result, we get 5 best attributes based on Information Gain and Ranker search method.

Ranked attributes:
 1.1517   22 canker-lesion
 1.0129   15 leafspot-size
 0.9852   29 fruit-spots
 0.8684   13 leafspots-halo
 0.8535   21 stem-cankers
Repeat the above steps into

Information Gain - 10 best attributes
Gain Ratio - 5 best attributes
Gain Ratio - 10 best attributes

The above are the results

Information Gain - 10 best attributes

Gain Ratio - 5 best attributes

Gain Ratio - 10 best attributes

Newbies analysis :

The analysis is just want to show there are differences of ranked attributes between Information Gain and Gain Ratio.

For 5 best attributes :

Information Gain
Gain Ratio
Ranked attributes:
 1.1517   22 canker-lesion
 1.0129   15 leafspot-size
 0.9852   29 fruit-spots
 0.8684   13 leafspots-halo
 0.8535   21 stem-cankers
Ranked attributes:
 0.944   27 sclerotia
 0.944   26 int-discolor
 0.833   18 leaf-mild
 0.773   15 leafspot-size
 0.753   35 roots

The best split for Information Gain is canker-lesion, while for Gain Ratio is sclerotia. And there are 1 common attribute which is leafspots-size.

For 10 best attributes :

Information Gain
Gain Ratio
Ranked attributes:
 1.1517   22 canker-lesion
 1.0129   15 leafspot-size
 0.9852   29 fruit-spots
 0.8684   13 leafspots-halo
 0.8535   21 stem-cankers
 0.8504   14 leafspots-marg
 0.8437   28 fruit-pods
 0.6918   19 stem
 0.6715    1 date
 0.6265   11 plant-growth
Ranked attributes:
 0.944   27 sclerotia
 0.944   26 int-discolor
 0.833   18 leaf-mild
 0.773   15 leafspot-size
 0.753   35 roots
 0.743   14 leafspots-marg
 0.702   13 leafspots-halo
 0.702   12 leaves
 0.698   19 stem
 0.678   11 plant-growth

This is just the same like above, just adding 5 attributes. For this one, there are 5 common attributes which are leafspot-size, leafspots-halo, leafspots-marg, stem and plant-growth.

That's all. Thanks!


Different ordered data, different result?

Different ordered data, different result?

The answer is yes!

I will show you my result when I was testing with my altered state of consciousness data. Try with weka and using Multilayer Perceptron with same parameter.

Here are the result, same data but with different ordered :






Weka - Attribute Selector Classifier

In weka, they have three technique to perform selected attribute which are :
  • native approach, using the attribute selection classes directly
  • using a meta-classifier
  • the filter approach
For this time, I will be using meta-classifier. Basically meta-classifier will use Attribute Selector Classifier, after it reduce the attribute, then the attribute reduced will be use in other method.

For example :-

You have a data set, the column in data set are :
  • name
  • age
  • smoking
  • heart rate
  • no. tel

After using Attribute Selector Classifier to the data, it will reduce the attribute to :
  • age
  • smoking
  • heart rate
So this attribute will be use in other method such as Multilayer Perceptron, Naive Bayes or any method. That's it.

Practical Session :

Open your weka, and load any data. Or you can try download data from here.

After that go to classify tab.


Then click button Choose -> meta -> AttributeSelectedClassifier


You can change the method, for example I choose Linear Regression.


Just click OK, then choose any Test Options, I choose Percentage split, by 70% for training set, 30% for testing.


Thank you.

Source : https://weka.wikispaces.com/Performing+attribute+selection

Final Year Project Progress - Week 2

So this week, I just focus on choosing the best method for my fyp. My project is about predict the altered state of consciousness. So from the input given which are 31 attributes for source(input) and 2 attributes for target(output).

Actually this project has been done by my senior, and he used neural network to predict both target. So now I has to use other method to predict the targets. I have done some research regarding predictive modelling.

https://en.wikipedia.org/wiki/Predictive_modelling

And also other sources that may be related :

https://www.quora.com/What-are-some-Machine-Learning-algorithms-that-you-should-always-have-a-strong-understanding-of-and-why

http://www.tutorialspoint.com/data_mining/dm_classification_prediction.htm

http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/

So based on the research that I've been done, like in the Quora link given above, Sean Owen encouraged to use Random Forest for classification/regression. Also other method that catch my attention is Naive Bayes.

Based on the data that have been given to me by my supervisor (Shamimi A. Halim), so I started to play it with my weka tools.

Four method I used in this research :

1) Multilayer Perceptron (Backpropogation)
2) Naive Bayes
3) Random Forest
4) Logistic Regression


Multilayer Perceptron

Want to learn more :
https://en.wikipedia.org/wiki/Multilayer_perceptron

For this try-n-error research, the data have been preprocessed and just focus on one output which is status(Alive or Dead). Data set for training is 90%, the other 10% for testing. Total data is 204.

Parameter :


Result :


From the above result, I only got 70% accuracy.

So after try other parameters, I got the best(maybe?) parameter which have 3 hidden layers.

Parameter :


Result :



From the result I got 85% accuracy.


Naive Bayes

Want to learn more :
https://en.wikipedia.org/wiki/Naive_Bayes_classifier

No parameter.

Result :


From the result I got 75% accuracy.

Random Forest 

Want to learn more :
http://www.listendata.com/2014/11/random-forest-with-r.html
https://en.wikipedia.org/wiki/Random_forest 

Parameter :





Result :


From the above result, I only got 70% accuracy.

So after try other parameters, I got the best(maybe?) parameter which numFeatures(number of features) set to 6 and (I dont know the function of seed, but I think it is related to randomness)seed set to 5.

Parameter :


Result :


From the image above, we got 85% accuracy which is same with Multi Layer Perceptron.

Logistic Regression

So why I choose Logistic Regression to try-n-error? Based on the definition in wikipedia :

In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical.

Based on my output, which are Alive or Dead, it is categorical. That's why I try this method too.

Parameter :


Result :


Yeah! 85% accuracy. Same with Multi Layer Perceptron and Random Forest result.

 So what now? I dont know. Lol. Maybe I have to study the algorithms before decide which method suitable and efficient for the data.
jjencode - Javascript Obfuscation

jjencode - Javascript Obfuscation

Today, I try a challenge where I need to decode the jjencode text. I know that it is javascript obfuscation, but I dont know where to find the correct decoder since I search with a wrong keyword. Lol!

Then when I found the decoder, I try to decode the text, and got the flag. For example this challenge soooomixeddd, where I need to decode the jjencode.

So how to make javascript unreadable? This web can help you do so. Actually there are many webs/tools that are provide this services.  To decode it, you can use toos such as python-jjdecoder or Decoder-JJEncode. 

For example :

Javascript code :

window.alert("Hello Programmer!");

jjencode :

$=~[];$={___:++$,$$$$:(![]+"")[$],__$:++$,$_$_:(![]+"")[$],_$_:++$,$_$$:({}+"")[$],$$_$:($[$]+"")[$],_$$:++$,$$$_:(!""+"")[$],$__:++$,$_$:++$,$$__:({}+"")[$],$$_:++$,$$$:++$,$___:++$,$__$:++$};$.$_=($.$_=$+"")[$.$_$]+($._$=$.$_[$.__$])+($.$$=($.$+"")[$.__$])+((!$)+"")[$._$$]+($.__=$.$_[$.$$_])+($.$=(!""+"")[$.__$])+($._=(!""+"")[$._$_])+$.$_[$.$_$]+$.__+$._$+$.$;$.$$=$.$+(!""+"")[$._$$]+$.__+$._+$.$+$.$$;$.$=($.___)[$.$_][$.$_];$.$($.$($.$$+"\""+"\\"+$.__$+$.$$_+$.$$$+"\\"+$.__$+$.$_$+$.__$+"\\"+$.__$+$.$_$+$.$$_+$.$$_$+$._$+"\\"+$.__$+$.$$_+$.$$$+"."+$.$_$_+(![]+"")[$._$_]+$.$$$_+"\\"+$.__$+$.$$_+$._$_+$.__+"(\\\"\\"+$.__$+$.__$+$.___+$.$$$_+(![]+"")[$._$_]+(![]+"")[$._$_]+$._$+"\\"+$.$__+$.___+"\\"+$.__$+$._$_+$.___+"\\"+$.__$+$.$$_+$._$_+$._$+"\\"+$.__$+$.$__+$.$$$+"\\"+$.__$+$.$$_+$._$_+$.$_$_+"\\"+$.__$+$.$_$+$.$_$+"\\"+$.__$+$.$_$+$.$_$+$.$$$_+"\\"+$.__$+$.$$_+$._$_+"!\\\");"+"\"")())();
jjdecode :

 window.alert("Hello Programmer!");

Simple and clean. 
UiTM CTF 2015 - soooomixeddd

UiTM CTF 2015 - soooomixeddd

You have been given text file contains this script. Obviously it is javascript.

<script language=javascript>document.write(unescape('%3C%73%63%72%69%70%74%20%6C%61%6E%67%75%61%67%65%3D%22%6A%61%76%61%73%63%72%69%70%74%22%3E%66%75%6E%63%74%69%6F%6E%20%64%46%28%73%29%7B%76%61%72%20%73%31%3D%75%6E%65%73%63%61%70%65%28%73%2E%73%75%62%73%74%72%28%30%2C%73%2E%6C%65%6E%67%74%68%2D%31%29%29%3B%20%76%61%72%20%74%3D%27%27%3B%66%6F%72%28%69%3D%30%3B%69%3C%73%31%2E%6C%65%6E%67%74%68%3B%69%2B%2B%29%74%2B%3D%53%74%72%69%6E%67%2E%66%72%6F%6D%43%68%61%72%43%6F%64%65%28%73%31%2E%63%68%61%72%43%6F%64%65%41%74%28%69%29%2D%73%2E%73%75%62%73%74%72%28%73%2E%6C%65%6E%67%74%68%2D%31%2C%31%29%29%3B%64%6F%63%75%6D%65%6E%74%2E%77%72%69%74%65%28%75%6E%65%73%63%61%70%65%28%74%29%29%3B%7D%3C%2F%73%63%72%69%70%74%3E'));dF('%264Diunm%264F%261B%264Difbe%264F%261B%264Dtdsjqu%2631uzqf%264E%2633ufyu0kbwbtdsjqu%2633%264F%261Bgvodujpo%2631tipx%60bmfsu%2639%263%3A%261B%268C%261Bbmfsu%2639%2633F%5BXIJP%5B8PCVIBEJLNW4HD4CJN66HT4UHOSRYJ%5BKJNKRYH%5BKXHSQXJ%5BMEO6THLLCIOKMHJ%5BELH%5BZFTSLRHGCDX63NKKWHTVDRH6KWLUEOJSVFDOEXJF5IL3UIPOOHTS%5BROW5H54KSHJ5UTPLFNWYUTPDGLVZHZWSSLSZEF5MDPS3YHU%5BWGO5WJ6SWNK6FV4TONRYVL%5BSYHOFEBR4XNFZXTXTRHJZF334RJZZUTPLNPO%5BDXNDHHBWWTW%5BUKO3X37CZOZ4IN7TRON%5BXD%5BTFNSHYRWU3KOJF5N%5BTQKSUR633KWUHFS4HKJZIRWTJHKLHNN4FP64IJUTONO3DXUCXO6MXH4M%5BKO4EFOTLMCNXLPDGJGTEL%5BKZQKYIRZLGP%5B5GVWK%5BHWKFHM3QPWBVT4SMPSFH534CJGOEP3%5BSPNYWN6CVPVZHD3EUHGXDXR3%5BPR5FJXDRG6HELT3FO%5BJELTCQN6YHRW3ENKCUJWMRG6UWDZ%5BWKCWIBPK%5BNS3D7NMDOKTXPUEXMBZYBOTZIC%5BVF%5BUHKKTGFSUDOJ4UNL3SK53EDS4KJK3YT6MJQK3DX7KUP6VXTZTGJGSUJTSWMFYXR%5BT%5BNNZXXSCZGOGEDO4FKC3GVUCMHGOHNVESJWNID3CMLGOENO4CPC3UJ4UOH%5B3FHO3KO%5BWFPUDRJFWX5ZLOLG6GLTLENK5VDWKZJSHHTXUOHSFINVMVMBYVLXUEKOXYLXUNNF4IRUDIQGDXDNTVNSYVZODSHG5DXVMQHWGYN7EDPGSX3XTKNN%5BGTL4QLCTWLXEDIG5FN4EMHOYXNVTSHOIHX4MMJOHXF4MXLSSITWUWPGXYF3%5BMGOHUJNTTORYXT7UHQCLHN7CUH54YDOE3LC5XJL%5B%5BPR5YFM43JNYXTVMXQCYED3TFQFWVNTDPLWHHN%5BTCP%5BRVHUTXLC%5BXFT3QLCKF7U3FNR5YFSDUOOJHDODXNGWYRWSMNKKVP4TTNV%5BIDVUSHOXGD3DROK4IPNCRPSYHZPE3PKSGB5M3L6JVZXSRGO4EPV3NJ6VHFW%5BWH6LHP43IPWHEDL33K%5BRWL7U3KSSIH4KQNW%5BGPOUWNK4DXZMMLKXXJ44TKSVXN4KMJOUUH4UKPWHEHULMNS4EFXEWKKNXF%5BLFH55WN3K%5BOSWHJ3KZGO5H53KRJ%5B5D77EFHKTH7WTWK%5BLIBOKRLZZWR6C%5BH6ZYVSMNKR%5BUPXEEGOCYHNSMNV3ET7UJHRYYLTEOO6CV56CRNGSHRV33NF%5BIV5%5B%5BPSMXLT4%5BN5%5BYFNUUN%5B3VXWETLSKHV7UPJGSIH5SXPO3VT4SYHB3GH%5BSTNOTGR3LQLOXHVUDLP%5BKVRXEMQGFXPXT%5BHS4WD%5BCWKG4UN6DMMG%5BWT7EINR3GD643LWTWP6T3H%5B3TXZLPIB3FZ4TPIC5ED43TO%5BUIDWUUPKYFXXEXMK4GJ7SUHODH3%5BK%5BNR4V5SEHGOGWNZSWNOSWTXEMH%5B5FR33QGOVYFU3PNZ3FX3UEQCXHLZKUH%5BLXV7TVKJ4GRZ4THCCWJS%5BTL6KXLNUXPGRXDUKULG%5BD7V%5BSO%5BTHN7MONR3IBTEZHF4VR63ROZ5HRTCQOB%5BXR7UWNODIPZ4RPSTXF6EFHFYUJNMCLGYEHNLZJK%5BHLS4XPWTUDVU%5BIFWYBW3YJOZHJO3LLC3XPVSXMJ%5BVP4TRJKFWV%5B3XOR5V5ZKULSOGBOMEGOWERVDOOGLHZ3LFH6%5BD74MCKSVVNUTZOK4EPZUTQCDTXXDMPW3EBN3KLOTHJNCSK6%5BHDPLVH65INV4XPWXF5O4QJ%5BXXPV4FOKYHXSSMMGYDXO33HOZIR4TJNR4XVR4VQCFEN4EZH%5B4HR%5BDLOG6F57USLR3FX6TNP%5BCT7WDQPKHEHXMXMGXEPZLPGONXLV4PN%5B5WD7E3LRWYF6KTIBZWRUS%5BHSNILN4YNV5YNODSGORT7V%5BWJV%5BVRVDWOKXYDL%5BQMKYIPZSMKCCWPTEYN%5BNID7MHNWLWTOUNPR4GLOM%5BPWJH3%5BUIJOGIPWCZOCMITZTIPKTXZL%5B%5BQK%5BDXO%5BYP%5B4XDL4PHFWX3OEJOG6FR3MTMB%5BELSSRKCLVV%5BLTLZZFHUCQLOJGPSEHHGYINRSUJWSTX5EKICNETV%5BWN5%5BUNSUFMCKXR%5BUKJGDHTNDROGWEPNUVNKXGNVEYKWTF3L3IG5ZVZ63RKSSEF%5BMEJSTWHU4LL%5BTFXSCYP6UHJXSVH6TV5TDCH5ZDXOULL54GNO4%5BJKOIFTLFNOMX533XNN%5BYRNUVN%5BJFF6EUKS3VDTKQQKLYLS%5BYHS6FFRULOORYV4CVNOLXNOLYNW5FPN4MLCCUDR3GQCYIVUTFJ53YR%5BE%5BQFZYBM%5BQH%5BGHHS%5BVGN%5BH5PEKOGBV36CZNZ3HPWEVHNYTPLKKGF6R3DS8F%5BUYJPZ%264E%2633%263%3A%264C%261B%268E%261B%264D0tdsjqu%264F%261B%264D0ifbe%264F%261B%264Dcpez%264F%261B%261B%264Djoqvu%2631uzqf%264E%2633cvuupo%2633%2631podmjdl%264E%2633tipx%60bmfsu%2639%263%3A%2633%2631wbmvf%264E%2633Hppe%2631Mvdl%2632%2632%2632%2631%266F%60%266Fz%2633%26310%264F%261B%261B%264D0cpez%264F%261B%264D0iunm%264F1')</script>
Then we change the extension file .txt to .html. And open the file with web browser.  You will see a button "Good Luck...", press the button and pop up strings. Copy and paste to anywhere to see the whole strings.


From here you need to differentiate between Base32 and Base64, Base32 their characters mix of upper characters and numbers, while Base64 mixed with upper and lower chars and also number and other characters. Both strings will end whether = , == or nothing. So this is Base32, when we decode it, we will get php code.

So just edit to display the strings that will be deobfuscated and open the file with your browser and you will get this strings.


After that I try to decode it with base64 and got another weird string.

So after done some research of that weird strings, I got a website that mentioned about obfuscation. I dont know which language obfuscation is this. So try with php, no luck. Try with javascript. Yes!
Here is the output :
5930753472332430213333377a44754433436f6e4772415432546831734953666c3467thisisnottheanswerbutyouareNEAR 
So here just decode the hexadecimal, then got the flag.

Uniten Hack@10 Binary ex03

For this challenge, I will explain what I did to solve this challenge, eventhough it is quite easy if you really understand how the program's work.

So of course first we check whether it is packed or unpacked with PEiD.


So it shows nothing found? Means nothing to worry? Haha. So second step we do strings the program to check any weird/interesting string. Nothing interesting for me except this.


Then we try open the program with OllyDbg. And try to run the program with dummy input.


For sure got wrong. Then try to search for reference strings. But got no clue since the ouput "Wrong" and "Correct!!" did not display. So I search in Memory Map (alt + M), then double click .text.


And then searching and searching, I found this "Correct" and "Wrong" strings!


So I mark a breakpoint at the JBE instruction at 0x0119292C. So wen run the program again, and using dummy input to check whether it stops at our breakpoint marked or not?

No! It still display "Wrong" status, it means that it didn't stop at our breakpoint marked. So when we click at "Wrong" section, we can see in Hint Pane that the jump to "Wrong" section come from two address which were 0x01192915 and 0x0119292C.



So 0x0119292C is our breakpoint marked just now. And the other one is not, so we need to mark breakpoint at 0x01192915 too which is JNZ instruction. After that just dummy input to see whether it stop at our new breakpoint.

Yeah it stop at 0x01192915! From the instruction there, it compare register AX with BX, if it same then continue to the next instruction, if not go to "Wrong" section.

So we just change the zero flag so that we can continue, then we reach to our second breakpoint where it compare the EAX register with constant value 0F(in decimal 15), JBE(Jump if below or equal), so I assume that it compare the length of the strings. If the string length below or equal to 15 then it will go to the "Wrong" section, else continue to "Correct" section. So from there we just to reverse the step to get the string.

When we reverse to our first breakpoint, we know that it compare AX and BX register, so we need to know how AX and BX get the value. From the cpu window, there are 2 functions that have been called before the comparing section happen.


So we set a breakpoint at both functions and then analyse what happen when the function is called with our dummy input.

Then we reach to the first function, and then press F7 to step into the function called.

So here are the function 1 procedure.


So we just step into the function procedure, and then return back. From my understanding, the algorithm is just encode the string input into 4 bytes. Below is C code that I translate.


Then we go to the second function. Try to understand the algorithm. Based on my understanding, the EAX register will change to 7EEF. And 60 bytes are reserve in the stack. So we check that they input the value in the stack in unordered. After input all the value in the stack, then they do XOR instruction around 0x28 (40) times.



 Then return back, and compare the AX register with BX register. So that's how the program work.

1) Encode the strings to 4 bytes, and then return it. EAX hold the values of 4 bytes then move the value EBX register.

2) Then in the second function will place some value in stack and do some XOR then return back.

3) Compare AX and BX.

I assume that our strings maybe 40 characters, why? Because it is weird to reserve bytes in stack and do XOR for 40 times.

And also at step 3, AX value always 7EEF, while BX value depends on our input. So if we can make a string that will generate a value 7EEF after it encoded, it would be nice!

So I try to reverse the encoded function, but no luck! Haha. Around 2 days I spend to reverse the algorithm. And I also put a breakpoint at first instruction in both function.

First function


Second function


Then my luck comes, when I restart the program, I try to run at each breakpoint,
 then I found interesting string at first function procedure before the program start!



"This message is encrypted with blowfish"
I just copy the strings and input it to our program.


Yeah! Finally got the answer.

So I assume all this algorithm are blowfish cipher algorithm, that is why it was hard to reverse. Haha.

But but.. my technique was not so efficient, since I was just lucky to see the strings before the program start.. But but it was fun challenge!