Merge branch 'master' of soutade.fr:iwla

This commit is contained in:
Grégory Soutadé 2015-01-08 21:04:36 +01:00
commit 4dda80685b
6 changed files with 514 additions and 352 deletions

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
*~
*.pyc
*.gz

View File

@ -4,7 +4,7 @@ iwla
Introduction
------------
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
@ -32,25 +32,25 @@ Main values to edit are :
* **display_hooks** : List of display hooks
* **locale** : Displayed locale (_en_ or _fr_)
Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
python -m SimpleHTTPServer 8000
Open your favorite web browser at _http://localhost:8000_. Enjoy !
**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
Interesting default configuration values
----------------------------------------
* **DB_ROOT** : Default database directory (default ./output_db)
* **DISPLAY_ROOT** : Default HTML output directory (default ./output)
* **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
* **log_format** : Web server log format (nginx style). Default is apache log format
* **time_format** : Time format used in log format
* **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
* **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
* **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
* **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
* **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
* **css_path** : CSS path (you can add yours)
* **compress_output_files** : Files extensions to compress in gzip during display build
@ -64,7 +64,7 @@ As previously described, plugins acts like UNIX pipes : statistics are constantl
* **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
* **Display plugins** : They are in charge to produce HTML files from statistics.
To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
Statistics are stored in dictionaries :
@ -77,7 +77,7 @@ Statistics are stored in dictionaries :
Create a Plugins
----------------
To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
@ -175,34 +175,6 @@ iwla
None
plugins.display.top_downloads
-----------------------------
Display hook
Create TOP downloads page
Plugin requirements :
post_analysis/top_downloads
Conf values needed :
max_downloads_displayed*
create_all_downloads_page*
Output files :
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.all_visits
--------------------------
@ -230,34 +202,6 @@ plugins.display.all_visits
None
plugins.display.top_hits
------------------------
Display hook
Create TOP hits page
Plugin requirements :
post_analysis/top_hits
Conf values needed :
max_hits_displayed*
create_all_hits_page*
Output files :
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.referers
------------------------
@ -343,151 +287,57 @@ plugins.display.top_pages
None
plugins.post_analysis.top_downloads
-----------------------------------
plugins.display.top_hits
------------------------
Post analysis hook
Display hook
Count TOP downloads
Create TOP hits page
Plugin requirements :
None
post_analysis/top_hits
Conf values needed :
None
max_hits_displayed*
create_all_hits_page*
Output files :
None
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
None
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
plugins.display.top_downloads
-----------------------------
Post analysis hook
Display hook
Count TOP hits
Create TOP downloads page
Plugin requirements :
None
post_analysis/top_downloads
Conf values needed :
None
max_downloads_displayed*
create_all_downloads_page*
Output files :
None
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
@ -550,3 +400,153 @@ plugins.pre_analysis.robots
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
Post analysis hook
Count TOP hits
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.top_downloads
-----------------------------------
Post analysis hook
Count TOP downloads
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
Statistics deletion :
None

View File

@ -4,7 +4,7 @@ iwla
Introduction
------------
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
@ -32,25 +32,25 @@ Main values to edit are :
* **display_hooks** : List of display hooks
* **locale** : Displayed locale (_en_ or _fr_)
Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
python -m SimpleHTTPServer 8000
Open your favorite web browser at _http://localhost:8000_. Enjoy !
**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
Interesting default configuration values
----------------------------------------
* **DB_ROOT** : Default database directory (default ./output_db)
* **DISPLAY_ROOT** : Default HTML output directory (default ./output)
* **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
* **log_format** : Web server log format (nginx style). Default is apache log format
* **time_format** : Time format used in log format
* **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
* **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
* **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
* **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
* **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
* **css_path** : CSS path (you can add yours)
* **compress_output_files** : Files extensions to compress in gzip during display build
@ -64,7 +64,7 @@ As previously described, plugins acts like UNIX pipes : statistics are constantl
* **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
* **Display plugins** : They are in charge to produce HTML files from statistics.
To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
Statistics are stored in dictionaries :
@ -77,7 +77,7 @@ Statistics are stored in dictionaries :
Create a Plugins
----------------
To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).

View File

@ -83,34 +83,6 @@ iwla
None
plugins.display.top_downloads
-----------------------------
Display hook
Create TOP downloads page
Plugin requirements :
post_analysis/top_downloads
Conf values needed :
max_downloads_displayed*
create_all_downloads_page*
Output files :
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.all_visits
--------------------------
@ -138,34 +110,6 @@ plugins.display.all_visits
None
plugins.display.top_hits
------------------------
Display hook
Create TOP hits page
Plugin requirements :
post_analysis/top_hits
Conf values needed :
max_hits_displayed*
create_all_hits_page*
Output files :
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.referers
------------------------
@ -251,151 +195,57 @@ plugins.display.top_pages
None
plugins.post_analysis.top_downloads
-----------------------------------
plugins.display.top_hits
------------------------
Post analysis hook
Display hook
Count TOP downloads
Create TOP hits page
Plugin requirements :
None
post_analysis/top_hits
Conf values needed :
None
max_hits_displayed*
create_all_hits_page*
Output files :
None
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
None
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
plugins.display.top_downloads
-----------------------------
Post analysis hook
Display hook
Count TOP hits
Create TOP downloads page
Plugin requirements :
None
post_analysis/top_downloads
Conf values needed :
None
max_downloads_displayed*
create_all_downloads_page*
Output files :
None
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
@ -458,3 +308,153 @@ plugins.pre_analysis.robots
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
Post analysis hook
Count TOP hits
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.top_downloads
-----------------------------------
Post analysis hook
Count TOP downloads
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
Statistics deletion :
None

View File

@ -0,0 +1,98 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2015
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
from iwla import IWLA
from iplugin import IPlugin
from display import *
import logging
"""
Display hook itnerface
Enlight new and updated statistics
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
"""
class IWLADisplayStatsDiff(IPlugin):
def __init__(self, iwla):
super(IWLADisplayStatsDiff, self).__init__(iwla)
self.API_VERSION = 1
self.month_stats_key = None
# Set >= if month_stats[self.month_stats_key] is a list or a tuple
self.stats_index = -1
self.filename = None
self.block_name = None
self.logger = logging.getLogger(__name__)
def load(self):
if not self.month_stats_key or not self.filename or\
not self.block_name:
self.logger('Bad parametrization')
return False
month_stats = self.iwla.getMonthStats()
self.cur_stats = {k:v for (k,v) in month_stats.get(self.month_stats_key, {}).items()}
return True
def hook(self):
display = self.iwla.getDisplay()
month_stats = self.iwla.getMonthStats()
path = self.iwla.getCurDisplayPath(self.filename)
page = display.getPage(path)
if not page: return
title = self.iwla._(self.block_name)
block = page.getBlock(title)
if not block:
self.logger.error('Block %s not found' % (title))
return
stats_diff = {}
for (k, v) in month_stats[self.month_stats_key].items():
new_value = self.cur_stats.get(k, 0)
if new_value:
if self.stats_index != -1:
if new_value[self.stats_index] != v[self.stats_index]:
stats_diff[k] = 'iwla_update'
else:
if new_value != v:
stats_diff[k] = 'iwla_update'
else:
stats_diff[k] = 'iwla_new'
for (idx, row) in enumerate(block.rows):
if row[0] in stats_diff.keys():
block.setCellCSSClass(idx, 0, stats_diff[row[0]])

View File

@ -0,0 +1,61 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2015
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
from iwla import IWLA
from istats_diff import IWLADisplayStatsDiff
from display import *
"""
Display hook
Enlight new and updated key phrases in in all_key_phrases.html
Plugin requirements :
display/referers
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
"""
class IWLADisplayReferersDiff(IWLADisplayStatsDiff):
def __init__(self, iwla):
super(IWLADisplayReferersDiff, self).__init__(iwla)
self.API_VERSION = 1
self.requires = ['IWLADisplayReferers']
self.month_stats_key = 'key_phrases'
self.filename = 'key_phrases.html'
self.block_name = u'Key phrases'
def load(self):
if not self.iwla.getConfValue('create_all_key_phrases_page', True):
return False
return super(IWLADisplayReferersDiff, self).load()