Friday, June 24, 2011

Data Mining 2011


Today, I got results from Data Mining Cup 2011. In results I occured on 15 place (from 35) for first task, and doesn't occured in second's task table at all (but I want to know why).
Still my results wasn't so bad for that simple algorithm (on this later) I made:
NumberNameScoreComment
1.TU_Dortmund_169835
.........
12.Uni_Siberian_Telecommunication_160550
13.TU_Wien_151704<-- strange jump in scores
.........
15.Uni_Kharkov_150627<-- me
.........
30.Uni_Chile_221018
.........
35.Inst_Telkom_21230
So, as you see, I occured on middle group, and as I think, in this group we all used pretty same alogrithm - modifications of nearest neighborhood. But top 12 people have another alogithm, and I'll try to figure out what it was.
Still now I am describing mine algorithm.
For learn step I made hash map, where key was item number and value - hash maps, which contains as keys - items that was viewed\ordered in the same session of initial key, and value - count of views\orders.
For example:
1000|1|0
1000|2|0
1000|3|0
1001|1|0
1001|2|0
And in result I'll have {'1': {'2': 2, '3': 1}, '2': {'1': 2, '3': 1}, '3': {'1': 1, '2': '1'}}.
And when I needed to get test results, I for each session, gather items that was already viewed\ordered, and merge values of this hash map. Than sort values and return top 3 of them, which wasn't already in session.
For example:
1003|1|0
1003|2|0
Merged hashmap will be: {'1': 2, '2':  2, '3': 2, '4': 1, '5': 3}. Next removing 1 and 2, because they already in current session: {'3': 2, '4': 1, '5': 3}. Sort by value and return keys: [5, 3, 4].
But if there aren't 3 values in resulting list, I will additionaly return top selling items. For example, after sort step list was: [3], than adding top sellers it will be: [3, 10, 15], where 10 and 15 - top 2 selling items, which were calculated on learn step.
Additionally, I used weight for views\add to cart\orders - 1, 5, 10 when building learning hashmap.

For task 2 I made the same, but it builds this hash map online and returns current best choices on each step.
But I'm very interested how top1-3 teams solved this tasks. If I'll find out and they'll allow - I'll post it here =)

Thursday, May 26, 2011

Laptop disassembling

   Yesterday I spilt milk on keyboard and touchpad of my laptop (Sony Vaio F series). Windows hanged up and I turned off laptop quickly. After dry front part of laptop I tried to boot again and got nothig. I was very frustraited, but decided to wait a while - give it time to dry out.
    So today at morning I tried to boot again. I inserted battery, that I take out right after splilling, and pressed power on button. Couple secs nothing happends, and I my thoughts was crazy, than it started to boot... I thought that's it - now all ok. But bum! on a middle of Windows booting process display went down and green light near "power on" button started to blink. It always blinks only when not enought power on a battery... But I pluged in to power supply - so it couldn't be true reason. I was in my dark place, and started to think, how and what I can do to recovery information and etc.
    I hadn't time at morning to play with laptop more, so I returned to it later and evening. First I tried boot it again - it showed me only black screen and blinking green button. So I decided to unassemble laptop and to look inside. I had experience with this process with my previous laptop (HP), and it wasn't so hard - they had excellent instruction how to unassemble everything, including display. So I downloaded instruction, and was surprised when found only how install new memory stick. It was very strange, for me, that Sony doesn't add normal instruction how to deal with other hardware problem. Internet search got me to nothing - instructions for old models.
    And I decided to work on my own risk. So I got screwdriver and unscrew every screw from back panel of laptop. Than I took my time to take off that panel.


I stared in mother board, but it wasn't so clear for me, how remove it and get closer to keyboard. Plus I wasn't sure, that I didn't crashed anything, so I desided to test, if laptop still works.
And imagine my surprise when it started and booted to Win without any problems. So I shut it down, assemble back again, and now writing from this laptop - and it works pretty fine for me.

Ideas of this topic:
  - Backup your data
  - Don't drink near computer
  - Hope will recovery your laptop :)

Thursday, May 12, 2011

Dive into Java

Today I decided to solve one task on Java. Task - DMC2011, and result of second must be Java-class.
So I can't write for example on Scala (which I read is pretty cool language). So first I started is googled "java suck" and found couple interesting texts, most interesting: http://www.slideshare.net/jeffz/why-java-sucks-and-c-rocks-final. Just real facts why java suck vs C# rocks - read it.
Next I wrote code, that go thought large (140mb) csv file and represent information in internal HashMap structure and then serialize it to file. When I run it I got to swap, so I stopped execution.
So now it's only 1/5 of file (~30mb) and I wrote Python code just to compare:
JavaPython
Lines of code7638
IMHO code looksuglynot so bad
Memory usage23.3%31.4%
Execution time (cpu)48 sec77 sec
Output file140mb150mb

In result, I can also note, that Java reads and makes structure very fast but take a long time to serialize and write to file. And Python contrary.

Still work on this stuff. Wish me good luck :)

Sunday, May 8, 2011

Django-misc

And hi there again,

There are django-misc application, that I want to talk about.
First, I developed it with my friend - Vlad Frolov. We started to working on it couple years ago.
It was like a part of every project, we've done since than. We always used misc/ app as place, where all "strange", "not know were to put" code were placed.
And recently I published it on GitHub, and to PyPi (it was my first experience - and it turned out pretty easy).
So if you want latest version:
pip install git+git://github.com/ilblackdragon/django-misc.git
and stable version:
pip install django-misc

If you want to use bbcodes - install postmarkup, and if you want html clearer (remove javascript, iframes, etc - make html safe) - install BeautifulSoup.

After installation, you can add it to INSTALLED_APPS if you want template tags and management commands:
INSTALLED_APPS = (
   ...
   'misc',
   ...
)

From this I'll got:

  •  template tags misc_tags, bbcode_tags, html_tags, share_buttons
  • command - create_app - that will create application in apps/ folder
  • post_sync event that will create Site object from settings.SITE_NAME and SITE.DOMAIN
  • Middlewares: SpacelessMiddleware that removes spaces from response html; StripCookieMiddleware - remove google analytics cookies (to enable django cache for full page).
Plus I'll have (even without adding to INSTALLED_APPS):
  • json_encode.py : json_encode, json_response and json_template functions
  • views.py : server_error, redirect_by_name
  • decorators.py: to_template (render_to), receiver (standard on django 1.3)
  • utils.py: HttpResponseReload,  custom_spaceless, render_bbcode, strip_bbcode, str_to_class, get_alphabets (russian and english)
  • html clearer: html.clear.clear_html_code

This application still under development, so we'll add more functionality later. But it already have plenty a lot of cool stuff, so I add this application on all projects I create.
On full documentation I'll work later.

Django-manager

Hi there,

Today's topic - django-manager. This app were created because recently I needed to create couple django projects from scratch, and that wasn't fun.
So decision was - to create some app, that will have just simple "create project" script, and it will done all stuff, that I need to do, before and after standard "django-admin.py startapp".

Plus, one time I faced problem, that I haven't root access, so I can't install any application (like django) at all. So django-admin doesn't need to install at all/
So, lets start from start :)
1) Install django-manager from https://ilblackdragon@github.com/ilblackdragon/django-manager.git
git clone git+https://ilblackdragon@github.com/ilblackdragon/django-manager.git

2) Go to django-manager folder and run create_project.py
cd django-manager/
./create_project.py your_project_name [path/to/project]

This will create:
- Virtual environment for your project
- Create project folder for new django project
- Create new project with some stuff already made:
- apps/ folder - for your django applications
- apps/auth_ext - contains code for profiling, and you can change it whatever you want
- deploy/ - folder for your deploy scripts, requirements.txt, etc
- developer/ - place for scripts (for start update.sh & update.bat scripts)
In requirements.txt you'll find: django, mysql-python, south, django-misc, django-uni-form and some other useful python libraries.
Configured urls and settings for work with admin and auth_ext.
- Create git repository in project directory, and commits changes.

Plus, when I'll issue update of django-manager (and I'll), you can update you project by using update_project.py:
./update_project path/to/your_project

This will apply patches, that I'll create if I'll make any changes.

Personally, I'm not recommend Windows users to use virtual environments. That is why I'll shortly add keywords for create_project to disable virtual environment and git work.

PS Use Linux for your python\django development - it rulezzz ;)

Monday, April 18, 2011

Django Themes

Hello there.
Today I want to tell about Django application I developed couple month ago. I was working on Escalibro project (as team work on classes) and can't really decide how design should be.
So occasionally we decide that our service should have themes, so user can choose between couple of them, and use whatever he wants.
After googling around "django themes" I found only some strange app with "django-themes" name, and no actual source code on it.And I decided to write my own one.
It actually was pretty easy. Django has STATIC_URL and MEDIA_URL for pointing where images, css, js and stuff is - so I just needed Middleware that will change this variables on flight. (By the way, we use MEDIA_URL = STATIC_URL, so I change only STATIC_URL, but it can be easily fixed). But that wasn't all I wanted - I wanted that all design (not just images and css) can be changed. So I write TemplateLoader that search for templates, depends what theme is now up. And this is got my possibility to override default theme's template on another theme - so full freedom to make whatever designer wants.
Plus, after beta testing, we needed to make release theme - theme where css files are joined to one file, js files are obfuscated, etc. So I added DEBUG_THEME parameter and use it as indicator is now debug or release mode.
In result I uploaded all that code as application on github:
https://github.com/ilblackdragon/django-themes (fork it! =)

So short "how to use":

  1. Install it thought 
    pip install git+git://github.com/ilblackdragon/django-themes
  2. Add to settings.py: 
    INSTALLED_APPS += ('themes', )
  3. And next you need to make themes. Add config file themes_settings.py with next content:
    import os.path
    from themes.core import Theme, ThemesManager
    from django.utils.translation import ugettext_lazy as _
    
    PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))
    
    THEMES_MANAGER = ThemesManager()
    
    THEMES_MANAGER.add_theme(Theme(
            name = _("Default"),
            description = _("Default theme"),
            screenshot = "/static/default/screenshot.png",
            template_dir = os.path.join(PROJECT_ROOT, "templates/default"),
            media_url = "/static/default/",
    ))
    
    THEMES_MANAGER.set_default(0)
    
  4. Add line to your urls:
      (r'^themes/', include('themes.urls')),
    
  5. Sync data base and you ready to go
    ./manage.py syncdb
So what do this code doing? First you create Themes manager - it's just container of all themes you have.
Next you add theme - name, description, screenshot - this all will be shown on page, where user can choose whatever theme he want to use.
And next - template_dir - it's dir where all you templates are, and media_url - where your images and stuff are.
So pretty easy, instead of putting all templates in /templates/ folder - you just put them on /template/default. But when you want add new theme, you create folder, and create template, that will override template from default theme.
In few words - just try it =)

PS. We actually failed in have couple themes, but may be soon we will have new themes on Escalibro. Join us there ;)

Sunday, April 17, 2011

First post

Just first post for this blog.
I'll try to write here frequently.
The main topic of this blog will be development, web-development, etc.
And I'll try write all of this in English to get more audience. So I think it would be interesting :) Follow me @ilblackdragon