Compare commits
64 Commits
Author | SHA1 | Date | |
---|---|---|---|
b2c9879412 | |||
8691302741 | |||
|
53e7390b77 | ||
|
9b32a81ddb | ||
|
7b0ca661a1 | ||
|
4d0b993aec | ||
|
0211596508 | ||
|
bde91ca936 | ||
|
70de0d3aca | ||
|
9939922c31 | ||
|
6d46ac4461 | ||
|
46c9ae4f15 | ||
|
122ee875fa | ||
|
a03b1dfc4f | ||
c9500e2e99 | |||
ca3c0eefdf | |||
1e09852d18 | |||
|
db9009bb28 | ||
|
e2210f3eab | ||
|
9db72f41fd | ||
|
0464a3d8e7 | ||
|
b9566beb80 | ||
|
d78739157b | ||
|
d6d216db4d | ||
|
974d355dd4 | ||
|
f1ffbe40d8 | ||
|
83275a8db4 | ||
|
07eb919837 | ||
|
16cd817fec | ||
|
d32b2440ee | ||
|
9d3ff8b3b7 | ||
|
9c688e1545 | ||
|
7ef0911fa7 | ||
|
7507b8e77f | ||
b1b92412e0 | |||
b1e6f973a6 | |||
|
de79f526dd | ||
|
4b58048198 | ||
|
71d8ee2113 | ||
|
440f51ddd1 | ||
|
cad3467c25 | ||
|
44c76007cd | ||
|
adc04bf753 | ||
|
6500d98bdd | ||
|
a0a1f42df4 | ||
|
31bc67ceba | ||
|
3fdbc282c8 | ||
|
5f96c44edf | ||
|
58d31d842a | ||
f871f4975c | |||
|
16b0619f19 | ||
|
c8dfdd17f7 | ||
|
a5bef4ece6 | ||
|
b29765dda9 | ||
|
cb18cf928e | ||
|
21a21cd68f | ||
72db40d593 | |||
|
c6ce5cfc6f | ||
|
185664850d | ||
|
fef9c783f6 | ||
|
6a4fd4e9c8 | ||
|
ac246eabe2 | ||
|
9c57ad3ece | ||
|
3a8c667fdc |
48
ChangeLog
@ -1,6 +1,52 @@
|
||||
v0.8 (03/02/2025)
|
||||
** User **
|
||||
Add multimedia_re filter to detect multimedia files by regular expression
|
||||
Add domain and number of subscribers for feed parser
|
||||
Add "no_merge_feeds_parsers"_list conf value
|
||||
Add "robot_domains" configuration value
|
||||
Add rule for robot : forbid only "1 page and 1 hit"
|
||||
Add "ignore_url" conf value
|
||||
|
||||
** Dev **
|
||||
Sanitize HTTP requests before analyze
|
||||
Try to detect robots by "compatible" strings
|
||||
Move feeds and reverse_dns plugins from post_analysis to pre_analysis
|
||||
Move reverse DNS core management into iwla.py
|
||||
|
||||
** Bugs **
|
||||
Fix potential division by 0
|
||||
|
||||
|
||||
v0.7 (17/03/2024)
|
||||
** User **
|
||||
Awstats data updated (7.9)
|
||||
Improve page/hit detection
|
||||
--display-only switch now takes an argument (month/year), analyze is not yet necessary
|
||||
Add --disable-display option
|
||||
Geo IP plugin updated (use of [ip-api.com](https://ip-api.com/))
|
||||
Add _subdomains_ plugin
|
||||
New way to display global statistics : with links in months names instead of "Details" button
|
||||
Add excluded domain option
|
||||
|
||||
** Dev **
|
||||
Remove detection from awstats dataset for browser
|
||||
Don't analyze referer for non viewed hits/pages
|
||||
Remove all trailing slashs of URL before starting analyze
|
||||
Main key for visits is now "remote\_ip" and not "remote\_addr"
|
||||
Add IP type plugin to support IPv4 and IPv6
|
||||
Update robot detection
|
||||
Display visitor IP is now a filter
|
||||
Generate HTML part in dry run mode (but don't write it to disk)
|
||||
Set lang value in generated HTML page
|
||||
Add no\_referrer\_domains list to defaut_conf for website that defines this policy
|
||||
Set count\_hit\_only\_visitors to False by default
|
||||
|
||||
** Bugs **
|
||||
Flags management for feeds display
|
||||
|
||||
v0.6 (20/11/2022)
|
||||
** User **
|
||||
Replace track_users by filter_users plugins which can itnerpret conditional filters from configuration
|
||||
Replace track_users by filter_users plugins which can interpret conditional filters from configuration
|
||||
Don't save all visitors requests into database (save space and computing). Can be changed in deufalt_conf.py with keep_requests value
|
||||
Replace -c argument by config file. Now clean output is -C
|
||||
Add favicon
|
||||
|
90
conf.py
@ -1,6 +1,8 @@
|
||||
#DB_ROOT = './output_db'
|
||||
#DISPLAY_ROOT = './output_dev'
|
||||
|
||||
# Web server log
|
||||
analyzed_filename = '/var/log/apache2/access.log.1,/var/log/apache2/access.log'
|
||||
analyzed_filename = '/var/log/apache2/soutade.fr_access.log.1,/var/log/apache2/soutade.fr_access.log'
|
||||
|
||||
# Domain name to analyze
|
||||
domain_name = 'soutade.fr'
|
||||
@ -10,49 +12,99 @@ display_visitor_ip = True
|
||||
|
||||
# Hooks used
|
||||
pre_analysis_hooks = ['page_to_hit', 'robots']
|
||||
post_analysis_hooks = ['referers', 'top_pages', 'top_downloads', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'reverse_dns', 'ip_to_geo']
|
||||
display_hooks = ['filter_users', 'top_visitors', 'all_visits', 'referers', 'top_pages', 'top_downloads', 'referers_diff', 'ip_to_geo', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'top_downloads_diff', 'robot_bandwidth', 'top_pages_diff']
|
||||
post_analysis_hooks = ['reverse_dns', 'referers', 'top_pages', 'subdomains', 'top_downloads', 'operating_systems', 'browsers', 'hours_stats', 'feeds', 'ip_to_geo', 'filter_users']
|
||||
display_hooks = ['filter_users', 'top_visitors', 'all_visits', 'referers', 'top_pages', 'subdomains', 'top_downloads', 'referers_diff', 'ip_to_geo', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'top_downloads_diff', 'robot_bandwidth', 'top_pages_diff', 'all_visits_enlight']
|
||||
|
||||
# Reverse DNS timeout
|
||||
reverse_dns_timeout = 0.2
|
||||
|
||||
# Count this addresses as hit
|
||||
page_to_hit_conf = [r'^.+/logo[/]?$']
|
||||
page_to_hit_conf = [r'.+/logo[/]?', r'.+/.+\.py']
|
||||
# Count this addresses as page
|
||||
hit_to_page_conf = [r'^.+/category/.+$', r'^.+/tag/.+$', r'^.+/archive/.+$', r'^.+/ljdc[/]?$', r'^.+/source/tree/.*$', r'^.+/source/file/.*$', r'^.+/search/.+$']
|
||||
hit_to_page_conf = [
|
||||
# Blog
|
||||
r'.+/category/.+', r'.+/tag/.+', r'.+/archive/.+', r'.+/ljdc[/]?', r'.*/search/.+',
|
||||
# Indefero
|
||||
r'.+/source/tree/.*', r'.+/source/file/.*', r'.*/index$',
|
||||
# Denote
|
||||
r'.*/edit$', r'.*/add$', r'.+/[0-9]+$', r'.*/preferences$', r'.*/search$', r'.*/public_notes$', r'.*/template.*', r'.*/templates$',
|
||||
# Music
|
||||
r'.*/music/.*',
|
||||
]
|
||||
|
||||
# Because it's too long to build HTML when there is too much entries
|
||||
max_hits_displayed = 100
|
||||
max_downloads_displayed = 100
|
||||
|
||||
# Compressed files
|
||||
compress_output_files = ['html', 'css', 'js']
|
||||
|
||||
# Locale in French
|
||||
#locale = 'fr'
|
||||
|
||||
# Tracked IP
|
||||
tracked_ip = ['192.168.1.1']
|
||||
locale = 'fr'
|
||||
|
||||
# Filtered IP
|
||||
filtered_ip = [
|
||||
# r'192.168.*', # Local
|
||||
]
|
||||
filtered_ip = ['82.232.68.211', '78.153.243.190', '176.152.215.133',
|
||||
'83.199.87.88', # Lanion
|
||||
'193.136.115.1' # Lisbon
|
||||
]
|
||||
|
||||
import re
|
||||
# google_re = re.compile('.*google.*')
|
||||
# duck_re = re.compile('.*duckduckgo.*')
|
||||
soutade_re = re.compile('.*soutade.fr.*')
|
||||
|
||||
def my_filter(iwla, visitor):
|
||||
# Manage filtered users
|
||||
if visitor.get('filtered', False): return True
|
||||
filtered = False
|
||||
req = visitor['requests'][0]
|
||||
if visitor.get('country_code', '') == 'fr' and\
|
||||
req['server_name'] in ('blog.soutade.fr', 'www.soutade.fr', 'soutade.fr') and \
|
||||
req['extract_request']['extract_uri'] in ('/', '/index.html', '/about.html'):
|
||||
referer = req['extract_referer']['extract_uri']
|
||||
if referer in ('', '-'):
|
||||
# print(f'{req} MATCHED')
|
||||
filtered = True
|
||||
elif not soutade_re.match(referer):
|
||||
# if google_re.match(referer) or duck_re.match(referer):
|
||||
# print(f'{req} MATCHED')
|
||||
filtered = True
|
||||
|
||||
# Manage enlight users
|
||||
if visitor.get('enlight', None) is None and not visitor.get('feed_parser', False):
|
||||
enlight = False
|
||||
for i, req in enumerate(visitor['requests']):
|
||||
if i == 0 and req['server_name'] in ('indefero.soutade.fr'): break
|
||||
if req['server_name'] in ('blog.soutade.fr') and \
|
||||
req['extract_request']['extract_uri'] in ('/', '/index.html'):
|
||||
enlight = True
|
||||
break
|
||||
visitor['enlight'] = enlight
|
||||
return filtered
|
||||
|
||||
filtered_users = [
|
||||
# [['country_code', '=', 'cn'], ['viewed_pages', '>=', '100']],
|
||||
#[['country_code', '=', 'fr'], ['viewed_pages', '>=', '5'], ['viewed_hits', '>=', '5']],
|
||||
[my_filter],
|
||||
# [['country_code', '=', 'fr'], my_filter],
|
||||
]
|
||||
|
||||
# Excluded IP
|
||||
excluded_ip = [
|
||||
r'192.168.*', # Local
|
||||
r'117.78.58.*', # China ecs-117-78-58-25.compute.hwclouds-dns.com
|
||||
#'79.141.15.51', # Elsys
|
||||
#'165.225.20.107', # ST
|
||||
#'165.225.76.184', # ST #2
|
||||
'147.161.180.110', # Schneider
|
||||
'147.161.182.108', # Schneider 2
|
||||
'147.161.182.86', # Schneider 3
|
||||
]
|
||||
|
||||
# Feeds url
|
||||
feeds = [r'/atom.xml', r'/rss.xml']
|
||||
|
||||
# Feeds referers url
|
||||
feeds_referers = ['https://feedly.com']
|
||||
# Feeds agent url
|
||||
# feeds_agents = [r'.*feedly.com.*']
|
||||
|
||||
merge_feeds_parsers = True
|
||||
merge_feeds_parsers_list = [r'ec2-.*.compute-1.amazonaws.com']
|
||||
|
||||
# Consider xml files as multimedia (append to current list)
|
||||
multimedia_files_append = ['xml']
|
||||
@ -62,3 +114,5 @@ count_hit_only_visitors = False
|
||||
|
||||
# Not all robots bandwidth (too big)
|
||||
create_all_robot_bandwidth_page = False
|
||||
|
||||
#keep_requests = True
|
||||
|
@ -38,12 +38,16 @@ pages_extensions = ['/', 'htm', 'html', 'xhtml', 'py', 'pl', 'rb', 'php']
|
||||
# HTTP codes that are considered OK
|
||||
viewed_http_codes = [200, 304]
|
||||
|
||||
# URL to ignore
|
||||
ignore_url = []
|
||||
|
||||
# If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
|
||||
count_hit_only_visitors = True
|
||||
count_hit_only_visitors = False
|
||||
|
||||
# Multimedia extensions (not accounted as downloaded files)
|
||||
multimedia_files = ['png', 'jpg', 'jpeg', 'gif', 'ico', 'svg',
|
||||
'css', 'js']
|
||||
multimedia_files_re = []
|
||||
|
||||
# Default resources path (will be symlinked in DISPLAY_OUTPUT)
|
||||
resources_path = ['resources']
|
||||
@ -59,7 +63,19 @@ compress_output_files = ['html', 'css', 'js']
|
||||
locales_path = './locales'
|
||||
|
||||
# Default locale (english)
|
||||
locale = 'en_EN'
|
||||
locale = 'en'
|
||||
|
||||
# Don't keep requests of all visitors into database
|
||||
keep_requests = False
|
||||
|
||||
# Domain names that should be ignored
|
||||
excluded_domain_name = []
|
||||
|
||||
# Domains that set no-referer as Referer-Policy
|
||||
no_referrer_domains = []
|
||||
|
||||
# Domains used by robots
|
||||
robot_domains = []
|
||||
|
||||
# Feeds agent identifier
|
||||
feeds_agents = [r'.*NextCloud-News']
|
||||
|
75
display.py
@ -39,6 +39,9 @@ class DisplayHTMLRaw(object):
|
||||
self.iwla = iwla
|
||||
self.html = html
|
||||
|
||||
def resetHTML(self):
|
||||
self.html = ''
|
||||
|
||||
def setRawHTML(self, html):
|
||||
self.html = html
|
||||
|
||||
@ -106,10 +109,12 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
|
||||
self.rows_cssclasses = []
|
||||
self.table_css = u'iwla_table'
|
||||
self.human_readable_cols = human_readable_cols or []
|
||||
|
||||
def appendRow(self, row):
|
||||
self.objects = []
|
||||
|
||||
def appendRow(self, row, _object=None):
|
||||
self.rows.append(listToStr(row))
|
||||
self.rows_cssclasses.append([u''] * len(row))
|
||||
self.objects.append(_object)
|
||||
|
||||
def insertCol(self, col_number, col_title='', col_css_class=''):
|
||||
self.cols.insert(col_number, col_title)
|
||||
@ -139,6 +144,12 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
|
||||
|
||||
return self.rows[row][col]
|
||||
|
||||
def getRowObject(self, row):
|
||||
if row < 0 or row >= len(self.rows):
|
||||
raise ValueError('Invalid indices %d' % (row))
|
||||
|
||||
return self.objects[row]
|
||||
|
||||
def setCellValue(self, row, col, value):
|
||||
if row < 0 or col < 0 or\
|
||||
row >= len(self.rows) or col >= len(self.cols):
|
||||
@ -196,7 +207,7 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
|
||||
self.insertCol(column_insertion, self.iwla._('Ratio'), u'iwla_hit')
|
||||
for (index, r) in enumerate(self.rows):
|
||||
val = r[column] and int(r[column]) or 0
|
||||
self.setCellValue(index, column_insertion, '%.1f%%' % (float(val*100)/float(total)))
|
||||
self.setCellValue(index, column_insertion, '%.1f%%' % (total and float(val*100)/float(total) or 0))
|
||||
|
||||
def _filter(self, function, column, args):
|
||||
target_col = None
|
||||
@ -205,9 +216,9 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
|
||||
target_col = col
|
||||
break
|
||||
if target_col is None: return
|
||||
for row in self.rows:
|
||||
res = function(row[target_col], **args)
|
||||
if res:
|
||||
for idx, row in enumerate(self.rows):
|
||||
res = function(row[target_col], self.objects[idx], **args)
|
||||
if res is not None:
|
||||
row[target_col] = res
|
||||
|
||||
def _buildHTML(self):
|
||||
@ -353,23 +364,21 @@ class DisplayHTMLPage(object):
|
||||
|
||||
self.logger.debug('Write %s' % (filename))
|
||||
|
||||
if self.iwla.dry_run: return
|
||||
|
||||
f = codecs.open(filename, 'w', 'utf-8')
|
||||
f.write(u'<!DOCTYPE html>')
|
||||
f.write(u'<html>')
|
||||
f.write(u'<head>')
|
||||
f.write(u'<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />')
|
||||
f.write(u'<link rel="icon" type="image/png" href="/resources/icon/favicon.png"/>')
|
||||
f.write(u'<!DOCTYPE html>\n')
|
||||
f.write(u'<html lang="{}">\n'.format(self.iwla.getConfValue('locale', 'en')))
|
||||
f.write(u'<head>\n')
|
||||
f.write(u'<meta http-equiv="Content-type" content="text/html; charset=UTF-8"/>\n')
|
||||
f.write(u'<link rel="icon" type="image/png" href="/resources/icon/favicon.png"/>\n')
|
||||
for css in self.css_path:
|
||||
f.write(u'<link rel="stylesheet" href="/%s"/>' % (css))
|
||||
f.write(u'<link rel="stylesheet" href="/%s"/>\n' % (css))
|
||||
if self.title:
|
||||
f.write(u'<title>%s</title>' % (self.title))
|
||||
f.write(u'</head><body>')
|
||||
f.write(u'<title>%s</title>\n' % (self.title))
|
||||
f.write(u'</head><body>\n')
|
||||
for block in self.blocks:
|
||||
block.build(f, filters=filters)
|
||||
if displayVersion:
|
||||
f.write(u'<div style="text-align:center;width:100%%">Generated by <a href="%s">IWLA %s</a></div>' %
|
||||
f.write(u'<div style="text-align:center;width:100%%">Generated by <a href="%s">IWLA %s</a></div>\n' %
|
||||
("http://indefero.soutade.fr/p/iwla", self.iwla.getVersion()))
|
||||
f.write(u'</body></html>')
|
||||
f.close()
|
||||
@ -403,15 +412,14 @@ class DisplayHTMLBuild(object):
|
||||
self.pages.append(page)
|
||||
|
||||
def build(self, root):
|
||||
if not self.iwla.dry_run:
|
||||
display_root = self.iwla.getConfValue('DISPLAY_ROOT', '')
|
||||
if not os.path.exists(display_root):
|
||||
os.makedirs(display_root)
|
||||
for res_path in self.iwla.getResourcesPath():
|
||||
target = os.path.abspath(res_path)
|
||||
link_name = os.path.join(display_root, res_path)
|
||||
if not os.path.exists(link_name):
|
||||
os.symlink(target, link_name)
|
||||
display_root = self.iwla.getConfValue('DISPLAY_ROOT', '')
|
||||
if not os.path.exists(display_root):
|
||||
os.makedirs(display_root)
|
||||
for res_path in self.iwla.getResourcesPath():
|
||||
target = os.path.abspath(res_path)
|
||||
link_name = os.path.join(display_root, res_path)
|
||||
if not os.path.exists(link_name):
|
||||
os.symlink(target, link_name)
|
||||
|
||||
for page in self.pages:
|
||||
page.build(root, filters=self.filters)
|
||||
@ -419,6 +427,21 @@ class DisplayHTMLBuild(object):
|
||||
def addColumnFilter(self, column, function, args):
|
||||
self.filters.append(({'column':column, 'args':args}, function))
|
||||
|
||||
def getDisplayName(self, visitor):
|
||||
display_visitor_ip = True
|
||||
compact_host_name = True
|
||||
address = visitor['remote_addr']
|
||||
if display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
host_name = address
|
||||
if compact_host_name:
|
||||
ip = visitor['remote_ip'].replace('.', '-')
|
||||
host_name = host_name.replace(ip, 'IP')
|
||||
ip = ip.replace('-', '')
|
||||
host_name = host_name.replace(ip, 'IP')
|
||||
address = '%s [%s]' % (host_name, visitor['remote_ip'])
|
||||
return address
|
||||
|
||||
|
||||
#
|
||||
# Global functions
|
||||
|
268
docs/index.md
@ -6,7 +6,7 @@ Introduction
|
||||
|
||||
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
|
||||
|
||||
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
|
||||
Nevertheless, iwla is only focused on HTTP logs. It uses data (search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
|
||||
|
||||
Demo
|
||||
----
|
||||
@ -16,8 +16,7 @@ A demonstration instance is available [here](https://iwla-demo.soutade.fr)
|
||||
Usage
|
||||
-----
|
||||
|
||||
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
|
||||
|
||||
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-P|--disable-display] [-D|--dry-run]
|
||||
-c : Configuration file to use (default conf.py)
|
||||
-C : Clean output (database and HTML) before starting
|
||||
-i : Read data from stdin instead of conf.analyzed_filename
|
||||
@ -26,6 +25,7 @@ Usage
|
||||
-r : Reset analysis to a specific date (month/year)
|
||||
-z : Don't compress databases (bigger but faster, not compatible with compressed databases)
|
||||
-p : Only generate display
|
||||
-P : Don't generate display
|
||||
-d : Dry run (don't write/update files to disk)
|
||||
|
||||
Basic usage
|
||||
@ -48,6 +48,7 @@ You can also append an element to an existing default configuration list by usin
|
||||
multimedia_files_append = ['xml']
|
||||
or
|
||||
multimedia_files_append = 'xml'
|
||||
|
||||
Will append 'xml' to current multimedia_files list
|
||||
|
||||
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
|
||||
@ -87,7 +88,7 @@ To use plugins, just insert their file name (without _.py_ extension) in _pre_an
|
||||
Statistics are stored in dictionaries :
|
||||
|
||||
* **month_stats** : Statistics of current analysed month
|
||||
* **valid_visitor** : A subset of month_stats without robots
|
||||
* **valid_visitors** : A subset of month_stats without robots
|
||||
* **days_stats** : Statistics of current analysed day
|
||||
* **visits** : All visitors with all of its requests (only if 'keep_requests' is true or filtered)
|
||||
* **meta** : Final result of month statistics (by year)
|
||||
@ -103,6 +104,7 @@ The two functions to overload are _load(self)_ that must returns True or False i
|
||||
|
||||
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
|
||||
|
||||
|
||||
Plugins
|
||||
=======
|
||||
|
||||
@ -116,30 +118,35 @@ Optional configuration values ends with *.
|
||||
* plugins/display/filter_users.py
|
||||
* plugins/display/hours_stats.py
|
||||
* plugins/display/ip_to_geo.py
|
||||
* plugins/display/ip_type.py
|
||||
* plugins/display/istats_diff.py
|
||||
* plugins/display/operating_systems.py
|
||||
* plugins/display/referers_diff.py
|
||||
* plugins/display/referers.py
|
||||
* plugins/display/robot_bandwidth.py
|
||||
* plugins/display/subdomains.py
|
||||
* plugins/display/top_downloads_diff.py
|
||||
* plugins/display/top_downloads.py
|
||||
* plugins/display/top_hits.py
|
||||
* plugins/display/top_pages_diff.py
|
||||
* plugins/display/top_pages.py
|
||||
* plugins/display/top_visitors.py
|
||||
* plugins/display/visitor_ip.py
|
||||
* plugins/post_analysis/anonymize_ip.py
|
||||
* plugins/post_analysis/browsers.py
|
||||
* plugins/post_analysis/feeds.py
|
||||
* plugins/post_analysis/filter_users.py
|
||||
* plugins/post_analysis/hours_stats.py
|
||||
* plugins/post_analysis/ip_to_geo.py
|
||||
* plugins/post_analysis/ip_type.py
|
||||
* plugins/post_analysis/operating_systems.py
|
||||
* plugins/post_analysis/referers.py
|
||||
* plugins/post_analysis/reverse_dns.py
|
||||
* plugins/post_analysis/subdomains.py
|
||||
* plugins/post_analysis/top_downloads.py
|
||||
* plugins/post_analysis/top_hits.py
|
||||
* plugins/post_analysis/top_pages.py
|
||||
* plugins/pre_analysis/feeds.py
|
||||
* plugins/pre_analysis/page_to_hit.py
|
||||
* plugins/pre_analysis/reverse_dns.py
|
||||
* plugins/pre_analysis/robots.py
|
||||
|
||||
|
||||
@ -157,8 +164,13 @@ iwla
|
||||
analyzed_filename
|
||||
domain_name
|
||||
locales_path
|
||||
locale
|
||||
keep_requests*
|
||||
compress_output_files
|
||||
excluded_ip
|
||||
excluded_domain_name
|
||||
reverse_dns_timeout*
|
||||
ignore_url*
|
||||
|
||||
Output files :
|
||||
DB_ROOT/meta.db
|
||||
@ -199,7 +211,7 @@ iwla
|
||||
nb_visitors
|
||||
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
remote_addr
|
||||
remote_ip
|
||||
viewed_pages{0..31} # 0 contains total
|
||||
@ -423,6 +435,32 @@ plugins.display.ip_to_geo
|
||||
None
|
||||
|
||||
|
||||
plugins.display.ip_type
|
||||
-----------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Add IPv4/IPv6 statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/ip_type
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.istats_diff
|
||||
---------------------------
|
||||
|
||||
@ -543,7 +581,6 @@ plugins.display.robot_bandwidth
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
create_all_robot_bandwidth_page*
|
||||
|
||||
Output files :
|
||||
@ -560,6 +597,32 @@ plugins.display.robot_bandwidth
|
||||
None
|
||||
|
||||
|
||||
plugins.display.subdomains
|
||||
--------------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Add subdomains statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/subdomains
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.top_downloads_diff
|
||||
----------------------------------
|
||||
|
||||
@ -707,7 +770,33 @@ plugins.display.top_visitors
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.visitor_ip
|
||||
--------------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Display IP below visitor name
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
compact_ip*
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
@ -767,7 +856,7 @@ plugins.post_analysis.browsers
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
browser
|
||||
|
||||
month_stats :
|
||||
@ -781,38 +870,6 @@ plugins.post_analysis.browsers
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.feeds
|
||||
---------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
|
||||
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
|
||||
as it must be the same person with a different IP address.
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
feeds
|
||||
feeds_referers*
|
||||
merge_feeds_parsers*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
remote_addr =>
|
||||
feed_parser
|
||||
feed_name_analysed
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.filter_users
|
||||
----------------------------------
|
||||
|
||||
@ -856,13 +913,13 @@ plugins.post_analysis.filter_users
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
filtered
|
||||
geo_location
|
||||
|
||||
Statistics update :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
keep_requests
|
||||
|
||||
Statistics deletion :
|
||||
@ -936,6 +993,37 @@ plugins.post_analysis.ip_to_geo
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.ip_type
|
||||
-----------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Detect if IP is IPv4 or IPv6
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_ip =>
|
||||
ip_type
|
||||
|
||||
month_stats :
|
||||
ip_type : {4: XXX, 6: XXX}
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.operating_systems
|
||||
---------------------------------------
|
||||
|
||||
@ -954,7 +1042,7 @@ plugins.post_analysis.operating_systems
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
operating_system
|
||||
|
||||
month_stats :
|
||||
@ -1008,30 +1096,29 @@ plugins.post_analysis.referers
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.reverse_dns
|
||||
---------------------------------
|
||||
plugins.post_analysis.subdomains
|
||||
--------------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Replace IP by reverse DNS names
|
||||
Group top pages by subdomains
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
post_analysis/top_pages
|
||||
|
||||
Conf values needed :
|
||||
reverse_dns_timeout*
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
month_stats:
|
||||
subdomains =>
|
||||
domain => count
|
||||
|
||||
Statistics update :
|
||||
valid_visitors:
|
||||
remote_addr
|
||||
dns_name_replaced
|
||||
dns_analyzed
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
@ -1121,6 +1208,45 @@ plugins.post_analysis.top_pages
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.feeds
|
||||
--------------------------
|
||||
|
||||
Pre analysis hook
|
||||
|
||||
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
|
||||
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
|
||||
as it must be the same person with a different IP address.
|
||||
|
||||
Warning : When merge_feeds_parsers is activated, last access display date is the more
|
||||
recent date of all merged parsers found
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
feeds
|
||||
feeds_agents*
|
||||
merge_feeds_parsers*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
remote_ip =>
|
||||
feed_parser
|
||||
feed_name_analyzed
|
||||
feed_parser_last_access (for merged parser)
|
||||
feed_domain
|
||||
feed_uri
|
||||
feed_subscribers
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.page_to_hit
|
||||
--------------------------------
|
||||
|
||||
@ -1149,6 +1275,35 @@ plugins.pre_analysis.page_to_hit
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.reverse_dns
|
||||
--------------------------------
|
||||
|
||||
Pre analysis hook
|
||||
|
||||
Replace IP by reverse DNS names
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
robot_domains*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
valid_visitors:
|
||||
remote_addr
|
||||
dns_name_replaced
|
||||
dns_analyzed
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.robots
|
||||
---------------------------
|
||||
|
||||
@ -1160,7 +1315,8 @@ plugins.pre_analysis.robots
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
count_hit_only_visitors
|
||||
no_referrer_domains
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
10
docs/main.md
@ -6,7 +6,7 @@ Introduction
|
||||
|
||||
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
|
||||
|
||||
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
|
||||
Nevertheless, iwla is only focused on HTTP logs. It uses data (search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
|
||||
|
||||
Demo
|
||||
----
|
||||
@ -16,8 +16,7 @@ A demonstration instance is available [here](https://iwla-demo.soutade.fr)
|
||||
Usage
|
||||
-----
|
||||
|
||||
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
|
||||
|
||||
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-P|--disable-display] [-D|--dry-run]
|
||||
-c : Configuration file to use (default conf.py)
|
||||
-C : Clean output (database and HTML) before starting
|
||||
-i : Read data from stdin instead of conf.analyzed_filename
|
||||
@ -26,6 +25,7 @@ Usage
|
||||
-r : Reset analysis to a specific date (month/year)
|
||||
-z : Don't compress databases (bigger but faster, not compatible with compressed databases)
|
||||
-p : Only generate display
|
||||
-P : Don't generate display
|
||||
-d : Dry run (don't write/update files to disk)
|
||||
|
||||
Basic usage
|
||||
@ -48,6 +48,7 @@ You can also append an element to an existing default configuration list by usin
|
||||
multimedia_files_append = ['xml']
|
||||
or
|
||||
multimedia_files_append = 'xml'
|
||||
|
||||
Will append 'xml' to current multimedia_files list
|
||||
|
||||
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
|
||||
@ -87,7 +88,7 @@ To use plugins, just insert their file name (without _.py_ extension) in _pre_an
|
||||
Statistics are stored in dictionaries :
|
||||
|
||||
* **month_stats** : Statistics of current analysed month
|
||||
* **valid_visitor** : A subset of month_stats without robots
|
||||
* **valid_visitors** : A subset of month_stats without robots
|
||||
* **days_stats** : Statistics of current analysed day
|
||||
* **visits** : All visitors with all of its requests (only if 'keep_requests' is true or filtered)
|
||||
* **meta** : Final result of month statistics (by year)
|
||||
@ -103,6 +104,7 @@ The two functions to overload are _load(self)_ that must returns True or False i
|
||||
|
||||
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
|
||||
|
||||
|
||||
Plugins
|
||||
=======
|
||||
|
||||
|
258
docs/modules.md
@ -6,30 +6,35 @@
|
||||
* plugins/display/filter_users.py
|
||||
* plugins/display/hours_stats.py
|
||||
* plugins/display/ip_to_geo.py
|
||||
* plugins/display/ip_type.py
|
||||
* plugins/display/istats_diff.py
|
||||
* plugins/display/operating_systems.py
|
||||
* plugins/display/referers_diff.py
|
||||
* plugins/display/referers.py
|
||||
* plugins/display/robot_bandwidth.py
|
||||
* plugins/display/subdomains.py
|
||||
* plugins/display/top_downloads_diff.py
|
||||
* plugins/display/top_downloads.py
|
||||
* plugins/display/top_hits.py
|
||||
* plugins/display/top_pages_diff.py
|
||||
* plugins/display/top_pages.py
|
||||
* plugins/display/top_visitors.py
|
||||
* plugins/display/visitor_ip.py
|
||||
* plugins/post_analysis/anonymize_ip.py
|
||||
* plugins/post_analysis/browsers.py
|
||||
* plugins/post_analysis/feeds.py
|
||||
* plugins/post_analysis/filter_users.py
|
||||
* plugins/post_analysis/hours_stats.py
|
||||
* plugins/post_analysis/ip_to_geo.py
|
||||
* plugins/post_analysis/ip_type.py
|
||||
* plugins/post_analysis/operating_systems.py
|
||||
* plugins/post_analysis/referers.py
|
||||
* plugins/post_analysis/reverse_dns.py
|
||||
* plugins/post_analysis/subdomains.py
|
||||
* plugins/post_analysis/top_downloads.py
|
||||
* plugins/post_analysis/top_hits.py
|
||||
* plugins/post_analysis/top_pages.py
|
||||
* plugins/pre_analysis/feeds.py
|
||||
* plugins/pre_analysis/page_to_hit.py
|
||||
* plugins/pre_analysis/reverse_dns.py
|
||||
* plugins/pre_analysis/robots.py
|
||||
|
||||
|
||||
@ -47,8 +52,13 @@ iwla
|
||||
analyzed_filename
|
||||
domain_name
|
||||
locales_path
|
||||
locale
|
||||
keep_requests*
|
||||
compress_output_files
|
||||
excluded_ip
|
||||
excluded_domain_name
|
||||
reverse_dns_timeout*
|
||||
ignore_url*
|
||||
|
||||
Output files :
|
||||
DB_ROOT/meta.db
|
||||
@ -89,7 +99,7 @@ iwla
|
||||
nb_visitors
|
||||
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
remote_addr
|
||||
remote_ip
|
||||
viewed_pages{0..31} # 0 contains total
|
||||
@ -313,6 +323,32 @@ plugins.display.ip_to_geo
|
||||
None
|
||||
|
||||
|
||||
plugins.display.ip_type
|
||||
-----------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Add IPv4/IPv6 statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/ip_type
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.istats_diff
|
||||
---------------------------
|
||||
|
||||
@ -433,7 +469,6 @@ plugins.display.robot_bandwidth
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
create_all_robot_bandwidth_page*
|
||||
|
||||
Output files :
|
||||
@ -450,6 +485,32 @@ plugins.display.robot_bandwidth
|
||||
None
|
||||
|
||||
|
||||
plugins.display.subdomains
|
||||
--------------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Add subdomains statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/subdomains
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.top_downloads_diff
|
||||
----------------------------------
|
||||
|
||||
@ -597,7 +658,33 @@ plugins.display.top_visitors
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.display.visitor_ip
|
||||
--------------------------
|
||||
|
||||
Display hook
|
||||
|
||||
Display IP below visitor name
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
compact_ip*
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
@ -657,7 +744,7 @@ plugins.post_analysis.browsers
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
browser
|
||||
|
||||
month_stats :
|
||||
@ -671,38 +758,6 @@ plugins.post_analysis.browsers
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.feeds
|
||||
---------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
|
||||
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
|
||||
as it must be the same person with a different IP address.
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
feeds
|
||||
feeds_referers*
|
||||
merge_feeds_parsers*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
remote_addr =>
|
||||
feed_parser
|
||||
feed_name_analysed
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.filter_users
|
||||
----------------------------------
|
||||
|
||||
@ -746,13 +801,13 @@ plugins.post_analysis.filter_users
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
filtered
|
||||
geo_location
|
||||
|
||||
Statistics update :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
keep_requests
|
||||
|
||||
Statistics deletion :
|
||||
@ -826,6 +881,37 @@ plugins.post_analysis.ip_to_geo
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.ip_type
|
||||
-----------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Detect if IP is IPv4 or IPv6
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_ip =>
|
||||
ip_type
|
||||
|
||||
month_stats :
|
||||
ip_type : {4: XXX, 6: XXX}
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.operating_systems
|
||||
---------------------------------------
|
||||
|
||||
@ -844,7 +930,7 @@ plugins.post_analysis.operating_systems
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
operating_system
|
||||
|
||||
month_stats :
|
||||
@ -898,30 +984,29 @@ plugins.post_analysis.referers
|
||||
None
|
||||
|
||||
|
||||
plugins.post_analysis.reverse_dns
|
||||
---------------------------------
|
||||
plugins.post_analysis.subdomains
|
||||
--------------------------------
|
||||
|
||||
Post analysis hook
|
||||
|
||||
Replace IP by reverse DNS names
|
||||
Group top pages by subdomains
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
post_analysis/top_pages
|
||||
|
||||
Conf values needed :
|
||||
reverse_dns_timeout*
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
month_stats:
|
||||
subdomains =>
|
||||
domain => count
|
||||
|
||||
Statistics update :
|
||||
valid_visitors:
|
||||
remote_addr
|
||||
dns_name_replaced
|
||||
dns_analyzed
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
@ -1011,6 +1096,45 @@ plugins.post_analysis.top_pages
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.feeds
|
||||
--------------------------
|
||||
|
||||
Pre analysis hook
|
||||
|
||||
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
|
||||
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
|
||||
as it must be the same person with a different IP address.
|
||||
|
||||
Warning : When merge_feeds_parsers is activated, last access display date is the more
|
||||
recent date of all merged parsers found
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
feeds
|
||||
feeds_agents*
|
||||
merge_feeds_parsers*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
remote_ip =>
|
||||
feed_parser
|
||||
feed_name_analyzed
|
||||
feed_parser_last_access (for merged parser)
|
||||
feed_domain
|
||||
feed_uri
|
||||
feed_subscribers
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.page_to_hit
|
||||
--------------------------------
|
||||
|
||||
@ -1039,6 +1163,35 @@ plugins.pre_analysis.page_to_hit
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.reverse_dns
|
||||
--------------------------------
|
||||
|
||||
Pre analysis hook
|
||||
|
||||
Replace IP by reverse DNS names
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
robot_domains*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
valid_visitors:
|
||||
remote_addr
|
||||
dns_name_replaced
|
||||
dns_analyzed
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
|
||||
|
||||
plugins.pre_analysis.robots
|
||||
---------------------------
|
||||
|
||||
@ -1050,7 +1203,8 @@ plugins.pre_analysis.robots
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
count_hit_only_visitors
|
||||
no_referrer_domains
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
219
iwla.py
@ -32,6 +32,7 @@ import logging
|
||||
import gettext
|
||||
from calendar import monthrange
|
||||
from datetime import date, datetime
|
||||
import socket
|
||||
|
||||
import default_conf as conf
|
||||
|
||||
@ -50,8 +51,13 @@ Conf values needed :
|
||||
analyzed_filename
|
||||
domain_name
|
||||
locales_path
|
||||
locale
|
||||
keep_requests*
|
||||
compress_output_files
|
||||
excluded_ip
|
||||
excluded_domain_name
|
||||
reverse_dns_timeout*
|
||||
ignore_url*
|
||||
|
||||
Output files :
|
||||
DB_ROOT/meta.db
|
||||
@ -92,7 +98,7 @@ days_stats :
|
||||
nb_visitors
|
||||
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
remote_addr
|
||||
remote_ip
|
||||
viewed_pages{0..31} # 0 contains total
|
||||
@ -132,9 +138,10 @@ class IWLA(object):
|
||||
|
||||
ANALYSIS_CLASS = 'HTTP'
|
||||
API_VERSION = 1
|
||||
IWLA_VERSION = '0.6'
|
||||
IWLA_VERSION = '0.8'
|
||||
DEFAULT_DNS_TIMEOUT = 0.5
|
||||
|
||||
def __init__(self, logLevel, dry_run):
|
||||
def __init__(self, logLevel, args):
|
||||
self.meta_infos = {}
|
||||
self.analyse_started = False
|
||||
self.current_analysis = {}
|
||||
@ -142,8 +149,11 @@ class IWLA(object):
|
||||
self.cache_plugins = {}
|
||||
self.display = DisplayHTMLBuild(self)
|
||||
self.valid_visitors = None
|
||||
self.dry_run = dry_run
|
||||
self.args = args
|
||||
|
||||
self.reverse_dns_timeout = self.getConfValue('reverse_dns_timeout',
|
||||
IWLA.DEFAULT_DNS_TIMEOUT)
|
||||
|
||||
self.log_format_extracted = re.sub(r'([^\$?\w])', r'\\\g<1>', conf.log_format)
|
||||
self.log_format_extracted = re.sub(r'\$(\w+)', '(?P<\g<1>>.+)', self.log_format_extracted)
|
||||
self.http_request_extracted = re.compile(r'(?P<http_method>\S+) (?P<http_uri>\S+) (?P<http_version>\S+)')
|
||||
@ -155,13 +165,22 @@ class IWLA(object):
|
||||
self.excluded_ip = []
|
||||
for ip in conf.excluded_ip:
|
||||
self.excluded_ip += [re.compile(ip)]
|
||||
self.excluded_domain_name = []
|
||||
for domain_name in conf.excluded_domain_name:
|
||||
self.excluded_domain_name += [re.compile(domain_name)]
|
||||
self.ignore_url = []
|
||||
for url in conf.ignore_url:
|
||||
self.ignore_url += [re.compile(url)]
|
||||
self.multimedia_files_re = []
|
||||
for file_re in conf.multimedia_files_re:
|
||||
self.multimedia_files_re += [re.compile(file_re)]
|
||||
self.plugins = [(conf.PRE_HOOK_DIRECTORY , conf.pre_analysis_hooks),
|
||||
(conf.POST_HOOK_DIRECTORY , conf.post_analysis_hooks),
|
||||
(conf.DISPLAY_HOOK_DIRECTORY , conf.display_hooks)]
|
||||
|
||||
logging.basicConfig(format='%(name)s %(message)s', level=logLevel)
|
||||
self.logger = logging.getLogger(self.__class__.__name__)
|
||||
if self.dry_run:
|
||||
if self.args.dry_run:
|
||||
self.logger.info('==> Start (DRY RUN)')
|
||||
else:
|
||||
self.logger.info('==> Start')
|
||||
@ -235,6 +254,26 @@ class IWLA(object):
|
||||
def getCSSPath(self):
|
||||
return conf.css_path
|
||||
|
||||
def reverseDNS(self, hit):
|
||||
if hit.get('dns_name_replaced', False):
|
||||
return hit['remote_addr']
|
||||
|
||||
try:
|
||||
timeout = socket.getdefaulttimeout()
|
||||
if timeout != self.reverse_dns_timeout:
|
||||
socket.setdefaulttimeout(self.reverse_dns_timeout)
|
||||
name, _, _ = socket.gethostbyaddr(hit['remote_ip'])
|
||||
if timeout != self.reverse_dns_timeout:
|
||||
socket.setdefaulttimeout(timeout)
|
||||
hit['remote_addr'] = name.lower()
|
||||
hit['dns_name_replaced'] = True
|
||||
except socket.herror:
|
||||
pass
|
||||
finally:
|
||||
hit['dns_analysed'] = True
|
||||
|
||||
return hit['remote_addr']
|
||||
|
||||
def _clearMeta(self):
|
||||
self.meta_infos = {
|
||||
'last_time' : None,
|
||||
@ -256,7 +295,8 @@ class IWLA(object):
|
||||
return gzip.open(filename, prot)
|
||||
|
||||
def _serialize(self, obj, filename):
|
||||
if self.dry_run: return
|
||||
if self.args.dry_run: return
|
||||
self.logger.info("==> Serialize to %s" % (filename))
|
||||
base = os.path.dirname(filename)
|
||||
if not os.path.exists(base):
|
||||
os.makedirs(base)
|
||||
@ -299,16 +339,25 @@ class IWLA(object):
|
||||
if request.endswith(e):
|
||||
self.logger.debug("True")
|
||||
return True
|
||||
# No extension -> page
|
||||
if not '.' in request.split('/')[-1]:
|
||||
self.logger.debug("True")
|
||||
return True
|
||||
self.logger.debug("False")
|
||||
return False
|
||||
|
||||
def isMultimediaFile(self, request):
|
||||
self.logger.debug("Is multimedia %s" % (request))
|
||||
def isMultimediaFile(self, uri):
|
||||
self.logger.debug("Is multimedia %s" % (uri))
|
||||
for e in conf.multimedia_files:
|
||||
if request.lower().endswith(e):
|
||||
if uri.lower().endswith(e):
|
||||
self.logger.debug("True")
|
||||
return True
|
||||
self.logger.debug("False")
|
||||
|
||||
for file_re in self.multimedia_files_re:
|
||||
if file_re.match(uri):
|
||||
self.logger.debug("Is multimedia re True")
|
||||
return True
|
||||
return False
|
||||
|
||||
def isValidVisitor(self, hit):
|
||||
@ -318,21 +367,32 @@ class IWLA(object):
|
||||
return True
|
||||
|
||||
def isRobot(self, hit):
|
||||
return hit['robot']
|
||||
# By default robot is None
|
||||
return hit['robot'] == True
|
||||
|
||||
def _appendHit(self, hit):
|
||||
remote_addr = hit['remote_addr']
|
||||
|
||||
if not remote_addr: return
|
||||
# Redirected page/hit
|
||||
if int(hit['status']) in (301, 302, 307, 308):
|
||||
return
|
||||
|
||||
remote_ip = hit['remote_ip']
|
||||
if not remote_ip: return
|
||||
|
||||
for ip in self.excluded_ip:
|
||||
if ip.match(remote_addr):
|
||||
if ip.match(remote_ip):
|
||||
return
|
||||
|
||||
if not remote_addr in self.current_analysis['visits'].keys():
|
||||
request = hit['extract_request']
|
||||
uri = request.get('extract_uri', request['http_uri'])
|
||||
|
||||
for url in self.ignore_url:
|
||||
if url.match(uri):
|
||||
return
|
||||
|
||||
if not remote_ip in self.current_analysis['visits'].keys():
|
||||
self._createVisitor(hit)
|
||||
|
||||
super_hit = self.current_analysis['visits'][remote_addr]
|
||||
super_hit = self.current_analysis['visits'][remote_ip]
|
||||
# Don't keep all requests for robots
|
||||
if not super_hit['robot']:
|
||||
super_hit['requests'].append(hit)
|
||||
@ -343,10 +403,6 @@ class IWLA(object):
|
||||
super_hit['bandwidth'][0] += int(hit['body_bytes_sent'])
|
||||
super_hit['last_access'] = self.meta_infos['last_time']
|
||||
|
||||
request = hit['extract_request']
|
||||
|
||||
uri = request.get('extract_uri', request['http_uri'])
|
||||
|
||||
hit['is_page'] = self.isPage(uri)
|
||||
|
||||
if super_hit['robot'] or\
|
||||
@ -375,17 +431,18 @@ class IWLA(object):
|
||||
super_hit['bandwidth'] = {0:0}
|
||||
super_hit['last_access'] = self.meta_infos['last_time']
|
||||
super_hit['requests'] = []
|
||||
super_hit['robot'] = False
|
||||
super_hit['robot'] = None
|
||||
super_hit['hit_only'] = 0
|
||||
|
||||
def _normalizeURI(self, uri, removeFileSlash=False):
|
||||
def _normalizeURI(self, uri, removeFileSlash=True):
|
||||
if uri == '/': return uri
|
||||
# Remove protocol
|
||||
uri = self.protocol_re.sub('', uri)
|
||||
# Remove double /
|
||||
uri = self.slash_re.sub('/', uri)
|
||||
if removeFileSlash and uri[-1] == '/':
|
||||
uri = uri[:-1]
|
||||
if removeFileSlash:
|
||||
while len(uri) > 1 and uri[-1] == '/':
|
||||
uri = uri[:-1]
|
||||
return uri
|
||||
|
||||
def _normalizeParameters(self, parameters):
|
||||
@ -416,8 +473,11 @@ class IWLA(object):
|
||||
referer_groups = self.uri_re.match(hit['http_referer'])
|
||||
if referer_groups:
|
||||
hit['extract_referer'] = referer_groups.groupdict("")
|
||||
hit['extract_referer']['extract_uri'] = self._normalizeURI(hit['extract_referer']['extract_uri'], True)
|
||||
hit['extract_referer']['extract_uri'] = self._normalizeURI(hit['extract_referer']['extract_uri'])
|
||||
hit['extract_referer']['extract_parameters'] = self._normalizeParameters(hit['extract_referer']['extract_parameters'])
|
||||
|
||||
hit['remote_ip'] = hit['remote_addr']
|
||||
|
||||
return True
|
||||
|
||||
def _decodeTime(self, hit):
|
||||
@ -454,14 +514,16 @@ class IWLA(object):
|
||||
link = DisplayHTMLRaw(self, '<iframe src="../_stats.html"></iframe>')
|
||||
page.appendBlock(link)
|
||||
|
||||
months_name = ['', self._('Jan'), self._('Feb'), self._('Mar'), self._('Apr'), self._('May'), self._('June'), self._('Jul'), self._('Aug'), self._('Sep'), self._('Oct'), self._('Nov'), self._('Dec')]
|
||||
_, nb_month_days = monthrange(cur_time.tm_year, cur_time.tm_mon)
|
||||
days = self.display.createBlock(DisplayHTMLBlockTableWithGraph, self._('By day'), [self._('Day'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth')], None, nb_month_days, range(1,6), [4, 5])
|
||||
days.setColsCSSClass(['', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
|
||||
nb_visits = 0
|
||||
nb_days = 0
|
||||
for i in range(1, nb_month_days+1):
|
||||
day = '%d<br/>%s' % (i, time.strftime('%b', cur_time))
|
||||
full_day = '%02d %s %d' % (i, time.strftime('%b', cur_time), cur_time.tm_year)
|
||||
month = months_name[int(time.strftime('%m', cur_time), 10)]
|
||||
day = '%d<br/>%s' % (i, month)
|
||||
full_day = '%02d %s %d' % (i, month, cur_time.tm_year)
|
||||
if i in self.current_analysis['days_stats'].keys():
|
||||
stats = self.current_analysis['days_stats'][i]
|
||||
row = [full_day, stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
|
||||
@ -506,52 +568,40 @@ class IWLA(object):
|
||||
cur_time = time.localtime()
|
||||
months_name = ['', self._('Jan'), self._('Feb'), self._('Mar'), self._('Apr'), self._('May'), self._('June'), self._('Jul'), self._('Aug'), self._('Sep'), self._('Oct'), self._('Nov'), self._('Dec')]
|
||||
title = '%s %d' % (self._('Summary'), year)
|
||||
cols = [self._('Month'), self._('Visitors'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth'), self._('Details')]
|
||||
graph_cols=range(1,7)
|
||||
cols = [self._('Month'), self._('Visitors'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth')]
|
||||
graph_cols=range(1,6)
|
||||
months = self.display.createBlock(DisplayHTMLBlockTableWithGraph, title, cols, None, 12, graph_cols, [5, 6])
|
||||
months.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth', ''])
|
||||
months_ = self.display.createBlock(DisplayHTMLBlockTableWithGraph, title, cols[:-1], None, 12, graph_cols[:-1], [5, 6])
|
||||
months_.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
|
||||
months.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
|
||||
total = [0] * len(cols)
|
||||
for i in range(1, 13):
|
||||
month = '%s<br/>%d' % (months_name[i], year)
|
||||
full_month = '%s %d' % (months_name[i], year)
|
||||
link_month = '<a target="_top" href="/%d/%02d/index.html">%s</a>' % (year, i, full_month)
|
||||
if i in month_stats.keys():
|
||||
stats = month_stats[i]
|
||||
link = '<a href="%d/%02d/index.html">%s</a>' % (year, i, self._('Details'))
|
||||
row = [full_month, stats['nb_visitors'], stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
|
||||
stats['viewed_bandwidth'], stats['not_viewed_bandwidth'], link]
|
||||
for j in graph_cols:
|
||||
row = [link_month, stats['nb_visitors'], stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
|
||||
stats['viewed_bandwidth'], stats['not_viewed_bandwidth']]
|
||||
for j in range(1,7):
|
||||
total[j] += row[j]
|
||||
else:
|
||||
row = [full_month, 0, 0, 0, 0, 0, 0, '']
|
||||
row = [full_month, 0, 0, 0, 0, 0, 0]
|
||||
months.appendRow(row)
|
||||
viewed_bandwidth = row[5]
|
||||
not_viewed_bandwidth = row[6]
|
||||
months.setCellValue(i-1, 5, viewed_bandwidth)
|
||||
months.setCellValue(i-1, 6, not_viewed_bandwidth)
|
||||
months.appendShortTitle(month)
|
||||
months_.appendRow(row[:-1])
|
||||
months_.setCellValue(i-1, 5, viewed_bandwidth)
|
||||
months_.setCellValue(i-1, 6, not_viewed_bandwidth)
|
||||
months_.appendShortTitle(month)
|
||||
if year == cur_time.tm_year and i == cur_time.tm_mon:
|
||||
css = months.getCellCSSClass(i-1, 0)
|
||||
if css: css = '%s %s' % (css, 'iwla_curday')
|
||||
else: css = 'iwla_curday'
|
||||
months.setCellCSSClass(i-1, 0, css)
|
||||
months_.setCellCSSClass(i-1, 0, css)
|
||||
|
||||
total[0] = self._('Total')
|
||||
total[7] = u''
|
||||
months.appendRow(total)
|
||||
page.appendBlock(months)
|
||||
|
||||
months_.appendRow(total[:-1])
|
||||
filename = '%d/_stats.html' % (year)
|
||||
page_ = self.display.createPage(u'', filename, conf.css_path)
|
||||
page_.appendBlock(months_)
|
||||
page_.appendBlock(months)
|
||||
page_.build(conf.DISPLAY_ROOT, False)
|
||||
months.resetHTML()
|
||||
|
||||
def _generateDisplayWholeMonthStats(self):
|
||||
title = '%s %s' % (self._('Statistics for'), conf.domain_name)
|
||||
@ -584,7 +634,7 @@ class IWLA(object):
|
||||
|
||||
if not os.path.exists(gz_path) or\
|
||||
os.stat(path).st_mtime > os.stat(gz_path).st_mtime:
|
||||
if self.dry_run: return
|
||||
if self.args.dry_run: return
|
||||
with open(path, 'rb') as f_in, gzip.open(gz_path, 'wb') as f_out:
|
||||
f_out.write(f_in.read())
|
||||
|
||||
@ -598,9 +648,11 @@ class IWLA(object):
|
||||
break
|
||||
|
||||
def _generateDisplay(self):
|
||||
if self.args.disable_display: return
|
||||
self._generateDisplayDaysStats()
|
||||
self._callPlugins(conf.DISPLAY_HOOK_DIRECTORY)
|
||||
self._generateDisplayWholeMonthStats()
|
||||
if self.args.dry_run: return
|
||||
self.display.build(conf.DISPLAY_ROOT)
|
||||
self._compressFiles(conf.DISPLAY_ROOT)
|
||||
|
||||
@ -645,7 +697,7 @@ class IWLA(object):
|
||||
|
||||
self._callPlugins(conf.POST_HOOK_DIRECTORY)
|
||||
|
||||
if args.display_only:
|
||||
if self.args.display_only:
|
||||
if not 'stats' in self.meta_infos.keys():
|
||||
self.meta_infos['stats'] = {}
|
||||
self._generateDisplay()
|
||||
@ -659,7 +711,6 @@ class IWLA(object):
|
||||
|
||||
path = self.getDBFilename(cur_time)
|
||||
|
||||
self.logger.info("==> Serialize to %s" % (path))
|
||||
self._serialize(self.current_analysis, path)
|
||||
|
||||
# Save month stats
|
||||
@ -672,7 +723,6 @@ class IWLA(object):
|
||||
self.meta_infos['stats'][year][month] = duplicated_stats
|
||||
|
||||
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
|
||||
self.logger.info("==> Serialize to %s" % (meta_path))
|
||||
self._serialize(self.meta_infos, meta_path)
|
||||
|
||||
self._generateDisplay()
|
||||
@ -708,6 +758,11 @@ class IWLA(object):
|
||||
self.logger.debug("Not in domain %s" % (hit))
|
||||
return False
|
||||
|
||||
for domain_name in self.excluded_domain_name:
|
||||
if domain_name.match(hit['server_name']):
|
||||
self.logger.debug("Domain name %s excluded" % (hit['server_name']))
|
||||
return False
|
||||
|
||||
t = self._decodeTime(hit)
|
||||
|
||||
cur_time = self.meta_infos['last_time']
|
||||
@ -772,8 +827,7 @@ class IWLA(object):
|
||||
if os.path.exists(output_path): shutil.rmtree(output_path)
|
||||
month += 1
|
||||
|
||||
def start(self, _file, args):
|
||||
self.args = args
|
||||
def start(self, _file):
|
||||
self.start_time = datetime.now()
|
||||
|
||||
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
|
||||
@ -799,12 +853,15 @@ class IWLA(object):
|
||||
for l in _file:
|
||||
# print "line " + l
|
||||
|
||||
groups = self.log_re.match(l)
|
||||
sanitized = l.replace('<', '')
|
||||
sanitized = sanitized.replace('>', '')
|
||||
|
||||
groups = self.log_re.match(sanitized)
|
||||
|
||||
if groups:
|
||||
self._newHit(groups.groupdict(""))
|
||||
else:
|
||||
self.logger.warning("No match for %s" % (l))
|
||||
self.logger.warning("No match for %s" % (sanitized))
|
||||
#break
|
||||
|
||||
if self.analyse_started:
|
||||
@ -815,6 +872,32 @@ class IWLA(object):
|
||||
self.logger.info('==> Analyse not started : nothing new')
|
||||
|
||||
|
||||
def displayOnly(self, start_time):
|
||||
self.start_time = datetime.now()
|
||||
|
||||
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
|
||||
if os.path.exists(meta_path):
|
||||
self.logger.info('==> Load previous database')
|
||||
|
||||
self.meta_infos = self._deserialize(meta_path) or self._clearMeta()
|
||||
self.meta_infos['last_time'] = time.strptime(start_time, '%m/%Y')
|
||||
|
||||
if self.meta_infos['last_time']:
|
||||
self.logger.info('Last time')
|
||||
self.logger.info(self.meta_infos['last_time'])
|
||||
self.current_analysis = self._deserialize(self.getDBFilename(self.meta_infos['last_time'])) or self._clearVisits()
|
||||
else:
|
||||
self._clearVisits()
|
||||
|
||||
self.meta_infos['start_analysis_time'] = None
|
||||
|
||||
self.cache_plugins = preloadPlugins(self.plugins, self)
|
||||
|
||||
self.logger.info('==> Analysing log')
|
||||
|
||||
self._generateDayStats()
|
||||
self._generateMonthStats()
|
||||
|
||||
class FileIter(object):
|
||||
def __init__(self, filenames):
|
||||
self.filenames = [f for f in filenames.split(',') if f]
|
||||
@ -880,9 +963,13 @@ if __name__ == '__main__':
|
||||
default=False,
|
||||
help='Don\'t compress databases (bigger but faster, not compatible with compressed databases)')
|
||||
|
||||
parser.add_argument('-p', '--display-only', dest='display_only', action='store_true',
|
||||
parser.add_argument('-p', '--display-only', dest='display_only',
|
||||
default='', type=str,
|
||||
help='Only generate display for a specific date (month/year)')
|
||||
|
||||
parser.add_argument('-P', '--disable-display', dest='disable_display', action='store_true',
|
||||
default=False,
|
||||
help='Only generate display')
|
||||
help='Don\'t generate display')
|
||||
|
||||
parser.add_argument('-D', '--dry-run', dest='dry_run', action='store_true',
|
||||
default=False,
|
||||
@ -920,14 +1007,18 @@ if __name__ == '__main__':
|
||||
if not isinstance(loglevel, int):
|
||||
raise ValueError('Invalid log level: %s' % (args.loglevel))
|
||||
|
||||
iwla = IWLA(loglevel, args.dry_run)
|
||||
iwla = IWLA(loglevel, args)
|
||||
|
||||
required_conf = ['analyzed_filename', 'domain_name']
|
||||
if not validConfRequirements(required_conf, iwla, 'Main Conf'):
|
||||
sys.exit(0)
|
||||
|
||||
if args.stdin:
|
||||
iwla.start(sys.stdin, args)
|
||||
if args.display_only:
|
||||
iwla.displayOnly(args.display_only)
|
||||
else:
|
||||
filename = args.file or conf.analyzed_filename
|
||||
iwla.start(FileIter(filename), args)
|
||||
if args.stdin:
|
||||
iwla.start(sys.stdin)
|
||||
else:
|
||||
filename = args.file or conf.analyzed_filename
|
||||
iwla.start(FileIter(filename))
|
||||
|
||||
|
@ -5,8 +5,8 @@
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: iwla\n"
|
||||
"POT-Creation-Date: 2022-11-10 20:07+0100\n"
|
||||
"PO-Revision-Date: 2022-11-10 20:08+0100\n"
|
||||
"POT-Creation-Date: 2024-03-16 08:52+0100\n"
|
||||
"PO-Revision-Date: 2025-02-03 09:57+0100\n"
|
||||
"Last-Translator: Soutadé <soutade@gmail.com>\n"
|
||||
"Language-Team: iwla\n"
|
||||
"Language: fr\n"
|
||||
@ -15,7 +15,7 @@ msgstr ""
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
|
||||
"Generated-By: pygettext.py 1.5\n"
|
||||
"X-Generator: Poedit 3.1.1\n"
|
||||
"X-Generator: Poedit 3.5\n"
|
||||
"X-Poedit-SourceCharset: UTF-8\n"
|
||||
|
||||
#: display.py:32
|
||||
@ -38,11 +38,11 @@ msgstr "Juillet"
|
||||
msgid "March"
|
||||
msgstr "Mars"
|
||||
|
||||
#: display.py:32 iwla.py:503
|
||||
#: display.py:32 iwla.py:474 iwla.py:526
|
||||
msgid "June"
|
||||
msgstr "Juin"
|
||||
|
||||
#: display.py:32 iwla.py:503
|
||||
#: display.py:32 iwla.py:474 iwla.py:526
|
||||
msgid "May"
|
||||
msgstr "Mai"
|
||||
|
||||
@ -66,179 +66,175 @@ msgstr "Octobre"
|
||||
msgid "September"
|
||||
msgstr "Septembre"
|
||||
|
||||
#: display.py:196
|
||||
#: display.py:207
|
||||
msgid "Ratio"
|
||||
msgstr "Pourcentage"
|
||||
|
||||
#: iwla.py:446
|
||||
#: iwla.py:467
|
||||
msgid "Statistics"
|
||||
msgstr "Statistiques"
|
||||
|
||||
#: iwla.py:454 iwla.py:505
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Apr"
|
||||
msgstr "Avr"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Aug"
|
||||
msgstr "Août"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Dec"
|
||||
msgstr "Déc"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Feb"
|
||||
msgstr "Fév"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Jan"
|
||||
msgstr "Jan"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Jul"
|
||||
msgstr "Jui"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Mar"
|
||||
msgstr "Mars"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Nov"
|
||||
msgstr "Nov"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Oct"
|
||||
msgstr "Oct"
|
||||
|
||||
#: iwla.py:474 iwla.py:526
|
||||
msgid "Sep"
|
||||
msgstr "Sep"
|
||||
|
||||
#: iwla.py:476 iwla.py:528
|
||||
msgid "Not viewed Bandwidth"
|
||||
msgstr "Traffic non vu"
|
||||
|
||||
#: iwla.py:454 iwla.py:505
|
||||
#: iwla.py:476 iwla.py:528
|
||||
msgid "Visits"
|
||||
msgstr "Visites"
|
||||
|
||||
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
|
||||
#: plugins/display/feeds.py:76 plugins/display/filter_users.py:77
|
||||
#: plugins/display/filter_users.py:118 plugins/display/hours_stats.py:73
|
||||
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
|
||||
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:77
|
||||
#: plugins/display/filter_users.py:123 plugins/display/hours_stats.py:73
|
||||
#: plugins/display/hours_stats.py:83 plugins/display/referers.py:95
|
||||
#: plugins/display/referers.py:153 plugins/display/top_visitors.py:72
|
||||
msgid "Pages"
|
||||
msgstr "Pages"
|
||||
|
||||
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
|
||||
#: plugins/display/feeds.py:76 plugins/display/filter_users.py:118
|
||||
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
|
||||
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:123
|
||||
#: plugins/display/hours_stats.py:73 plugins/display/hours_stats.py:83
|
||||
#: plugins/display/referers.py:95 plugins/display/referers.py:153
|
||||
#: plugins/display/top_downloads.py:97 plugins/display/top_visitors.py:72
|
||||
msgid "Hits"
|
||||
msgstr "Hits"
|
||||
|
||||
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
|
||||
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
|
||||
#: plugins/display/hours_stats.py:73 plugins/display/hours_stats.py:83
|
||||
#: plugins/display/robot_bandwidth.py:81 plugins/display/robot_bandwidth.py:106
|
||||
#: plugins/display/robot_bandwidth.py:90 plugins/display/robot_bandwidth.py:112
|
||||
#: plugins/display/top_visitors.py:72
|
||||
msgid "Bandwidth"
|
||||
msgstr "Bande passante"
|
||||
|
||||
#: iwla.py:454 plugins/display/hours_stats.py:71
|
||||
#: iwla.py:476 plugins/display/hours_stats.py:71
|
||||
msgid "By day"
|
||||
msgstr "Par jour"
|
||||
|
||||
#: iwla.py:454 plugins/display/hours_stats.py:73
|
||||
#: iwla.py:476 plugins/display/hours_stats.py:73
|
||||
msgid "Day"
|
||||
msgstr "Jour"
|
||||
|
||||
#: iwla.py:493
|
||||
#: iwla.py:516
|
||||
msgid "Average"
|
||||
msgstr "Moyenne"
|
||||
|
||||
#: iwla.py:496 iwla.py:541
|
||||
#: iwla.py:519 iwla.py:553
|
||||
msgid "Total"
|
||||
msgstr "Total"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Apr"
|
||||
msgstr "Avr"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Aug"
|
||||
msgstr "Août"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Dec"
|
||||
msgstr "Déc"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Feb"
|
||||
msgstr "Fév"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Jan"
|
||||
msgstr "Jan"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Jul"
|
||||
msgstr "Jui"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Mar"
|
||||
msgstr "Mars"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Nov"
|
||||
msgstr "Nov"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Oct"
|
||||
msgstr "Oct"
|
||||
|
||||
#: iwla.py:503
|
||||
msgid "Sep"
|
||||
msgstr "Sep"
|
||||
|
||||
#: iwla.py:504
|
||||
#: iwla.py:527
|
||||
msgid "Summary"
|
||||
msgstr "Résumé"
|
||||
|
||||
#: iwla.py:505
|
||||
#: iwla.py:528
|
||||
msgid "Month"
|
||||
msgstr "Mois"
|
||||
|
||||
#: iwla.py:505 iwla.py:517 plugins/display/feeds.py:101
|
||||
#: plugins/display/filter_users.py:113 plugins/display/operating_systems.py:90
|
||||
msgid "Details"
|
||||
msgstr "Détails"
|
||||
|
||||
#: iwla.py:505 plugins/display/ip_to_geo.py:94 plugins/display/ip_to_geo.py:112
|
||||
#: iwla.py:528 plugins/display/ip_to_geo.py:89 plugins/display/ip_to_geo.py:107
|
||||
msgid "Visitors"
|
||||
msgstr "Visiteurs"
|
||||
|
||||
#: iwla.py:553
|
||||
#: iwla.py:564
|
||||
msgid "Statistics for"
|
||||
msgstr "Statistiques pour"
|
||||
|
||||
#: iwla.py:560
|
||||
#: iwla.py:571
|
||||
msgid "Last update"
|
||||
msgstr "Dernière mise à jour"
|
||||
|
||||
#: iwla.py:564
|
||||
#: iwla.py:575
|
||||
msgid "Time analysis"
|
||||
msgstr "Durée de l'analyse"
|
||||
|
||||
#: iwla.py:566
|
||||
#: iwla.py:577
|
||||
msgid "hours"
|
||||
msgstr "heures"
|
||||
|
||||
#: iwla.py:567
|
||||
#: iwla.py:578
|
||||
msgid "minutes"
|
||||
msgstr "minutes"
|
||||
|
||||
#: iwla.py:567
|
||||
#: iwla.py:578
|
||||
msgid "seconds"
|
||||
msgstr "secondes"
|
||||
|
||||
#: plugins/display/all_visits.py:70 plugins/display/all_visits.py:92
|
||||
#: plugins/display/all_visits.py:70 plugins/display/all_visits.py:87
|
||||
#: plugins/display/all_visits_enlight.py:67
|
||||
msgid "All visits"
|
||||
msgstr "Toutes les visites"
|
||||
|
||||
#: plugins/display/all_visits.py:70 plugins/display/feeds.py:76
|
||||
#: plugins/display/filter_users.py:118 plugins/display/ip_to_geo.py:62
|
||||
#: plugins/display/robot_bandwidth.py:81 plugins/display/robot_bandwidth.py:106
|
||||
#: plugins/display/top_visitors.py:72
|
||||
#: plugins/display/all_visits.py:70 plugins/display/feeds.py:75
|
||||
#: plugins/display/filter_users.py:123 plugins/display/ip_to_geo.py:62
|
||||
#: plugins/display/robot_bandwidth.py:90 plugins/display/top_visitors.py:72
|
||||
#: plugins/display/visitor_ip.py:54
|
||||
msgid "Host"
|
||||
msgstr "Hôte"
|
||||
|
||||
#: plugins/display/all_visits.py:70 plugins/display/robot_bandwidth.py:81
|
||||
#: plugins/display/robot_bandwidth.py:106 plugins/display/top_visitors.py:72
|
||||
#: plugins/display/all_visits.py:70 plugins/display/robot_bandwidth.py:90
|
||||
#: plugins/display/robot_bandwidth.py:112 plugins/display/top_visitors.py:72
|
||||
msgid "Last seen"
|
||||
msgstr "Dernière visite"
|
||||
|
||||
#: plugins/display/all_visits.py:93 plugins/display/top_visitors.py:72
|
||||
#: plugins/display/all_visits.py:88 plugins/display/top_visitors.py:72
|
||||
msgid "Top visitors"
|
||||
msgstr "Top visiteurs"
|
||||
|
||||
#: plugins/display/browsers.py:79
|
||||
#: plugins/display/browsers.py:92
|
||||
msgid "Browsers"
|
||||
msgstr "Navigateurs"
|
||||
|
||||
#: plugins/display/browsers.py:79 plugins/display/browsers.py:114
|
||||
#: plugins/display/browsers.py:92 plugins/display/browsers.py:124
|
||||
msgid "Browser"
|
||||
msgstr "Navigateur"
|
||||
|
||||
#: plugins/display/browsers.py:79 plugins/display/browsers.py:114
|
||||
#: plugins/display/operating_systems.py:78
|
||||
#: plugins/display/operating_systems.py:95 plugins/display/top_hits.py:71
|
||||
#: plugins/display/top_hits.py:97 plugins/display/top_pages.py:71
|
||||
#: plugins/display/top_pages.py:96
|
||||
#: plugins/display/browsers.py:92 plugins/display/browsers.py:124
|
||||
#: plugins/display/ip_type.py:63 plugins/display/operating_systems.py:78
|
||||
#: plugins/display/operating_systems.py:95 plugins/display/subdomains.py:64
|
||||
#: plugins/display/top_hits.py:71 plugins/display/top_hits.py:97
|
||||
#: plugins/display/top_pages.py:71 plugins/display/top_pages.py:96
|
||||
msgid "Entrance"
|
||||
msgstr "Entrées"
|
||||
|
||||
#: plugins/display/browsers.py:99 plugins/display/browsers.py:130
|
||||
#: plugins/display/browsers.py:109 plugins/display/browsers.py:137
|
||||
#: plugins/display/filter_users.py:128 plugins/display/referers.py:110
|
||||
#: plugins/display/referers.py:125 plugins/display/referers.py:140
|
||||
#: plugins/display/referers.py:163 plugins/display/referers.py:174
|
||||
@ -246,42 +242,52 @@ msgstr "Entrées"
|
||||
#: plugins/display/top_downloads.py:83 plugins/display/top_downloads.py:103
|
||||
#: plugins/display/top_hits.py:82 plugins/display/top_hits.py:103
|
||||
#: plugins/display/top_pages.py:82 plugins/display/top_pages.py:102
|
||||
#: plugins/display/top_visitors.py:92
|
||||
#: plugins/display/top_visitors.py:87
|
||||
msgid "Others"
|
||||
msgstr "Autres"
|
||||
|
||||
#: plugins/display/browsers.py:106
|
||||
#: plugins/display/browsers.py:116
|
||||
msgid "Top Browsers"
|
||||
msgstr "Top Navigateurs"
|
||||
|
||||
#: plugins/display/browsers.py:108
|
||||
#: plugins/display/browsers.py:118
|
||||
msgid "All Browsers"
|
||||
msgstr "Tous les navigateurs"
|
||||
|
||||
#: plugins/display/browsers.py:125 plugins/display/filter_users.py:80
|
||||
msgid "Unknown"
|
||||
msgstr "Inconnu"
|
||||
|
||||
#: plugins/display/feeds.py:70
|
||||
msgid "All Feeds parsers"
|
||||
msgstr "Tous les agrégateurs"
|
||||
|
||||
#: plugins/display/feeds.py:76
|
||||
#: plugins/display/feeds.py:75
|
||||
msgid "All feeds parsers"
|
||||
msgstr "Tous les agrégateurs"
|
||||
|
||||
#: plugins/display/feeds.py:94
|
||||
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:77
|
||||
#: plugins/display/filter_users.py:123
|
||||
msgid "Last Access"
|
||||
msgstr "Dernière visite"
|
||||
|
||||
#: plugins/display/feeds.py:93
|
||||
msgid "Merged feeds parsers"
|
||||
msgstr "Agrégateurs fusionnés"
|
||||
|
||||
#: plugins/display/feeds.py:99
|
||||
#: plugins/display/feeds.py:98
|
||||
msgid "Feeds parsers"
|
||||
msgstr "Agrégateurs"
|
||||
|
||||
#: plugins/display/feeds.py:106
|
||||
#: plugins/display/feeds.py:100 plugins/display/filter_users.py:118
|
||||
#: plugins/display/operating_systems.py:90
|
||||
msgid "Details"
|
||||
msgstr "Détails"
|
||||
|
||||
#: plugins/display/feeds.py:105
|
||||
msgid "Found"
|
||||
msgstr "Trouvé"
|
||||
|
||||
#: plugins/display/filter_users.py:77
|
||||
msgid "Location"
|
||||
msgstr "Position"
|
||||
|
||||
#: plugins/display/filter_users.py:77
|
||||
msgid "Referer"
|
||||
msgstr "Origine"
|
||||
@ -290,17 +296,17 @@ msgstr "Origine"
|
||||
msgid "User Agent"
|
||||
msgstr "Navigateur"
|
||||
|
||||
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:111
|
||||
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:116
|
||||
msgid "Filtered users"
|
||||
msgstr "Utilisateurs filtrés"
|
||||
|
||||
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:118
|
||||
msgid "Last Access"
|
||||
msgstr "Dernière visite"
|
||||
#: plugins/display/filter_users.py:80
|
||||
msgid "Unknown"
|
||||
msgstr "Inconnu"
|
||||
|
||||
#: plugins/display/hours_stats.py:72
|
||||
msgid "Fri"
|
||||
msgstr "Jeu"
|
||||
msgstr "Ven"
|
||||
|
||||
#: plugins/display/hours_stats.py:72
|
||||
msgid "Mon"
|
||||
@ -334,19 +340,27 @@ msgstr "Par heures"
|
||||
msgid "Hours"
|
||||
msgstr "Heures"
|
||||
|
||||
#: plugins/display/ip_to_geo.py:94
|
||||
#: plugins/display/ip_to_geo.py:89
|
||||
msgid "Country"
|
||||
msgstr "Pays"
|
||||
|
||||
#: plugins/display/ip_to_geo.py:94 plugins/display/ip_to_geo.py:105
|
||||
#: plugins/display/ip_to_geo.py:112
|
||||
#: plugins/display/ip_to_geo.py:89 plugins/display/ip_to_geo.py:100
|
||||
#: plugins/display/ip_to_geo.py:107
|
||||
msgid "Countries"
|
||||
msgstr "Pays"
|
||||
|
||||
#: plugins/display/ip_to_geo.py:107
|
||||
#: plugins/display/ip_to_geo.py:102
|
||||
msgid "All countries"
|
||||
msgstr "Tous les pays"
|
||||
|
||||
#: plugins/display/ip_type.py:59
|
||||
msgid "IP types"
|
||||
msgstr "Type d'IP"
|
||||
|
||||
#: plugins/display/ip_type.py:63
|
||||
msgid "Type"
|
||||
msgstr "Type"
|
||||
|
||||
#: plugins/display/operating_systems.py:78
|
||||
#: plugins/display/operating_systems.py:88
|
||||
msgid "Operating Systems"
|
||||
@ -409,14 +423,33 @@ msgstr "Top phrases clé"
|
||||
msgid "All key phrases"
|
||||
msgstr "Toutes les phrases clé"
|
||||
|
||||
#: plugins/display/robot_bandwidth.py:99
|
||||
#: plugins/display/robot_bandwidth.py:90
|
||||
msgid "Name"
|
||||
msgstr "Nom"
|
||||
|
||||
#: plugins/display/robot_bandwidth.py:105
|
||||
msgid "Robots bandwidth"
|
||||
msgstr "Bande passante robots"
|
||||
|
||||
#: plugins/display/robot_bandwidth.py:101
|
||||
#: plugins/display/robot_bandwidth.py:107
|
||||
msgid "All robots bandwidth"
|
||||
msgstr "Bande passante tous les robots"
|
||||
|
||||
#: plugins/display/robot_bandwidth.py:112
|
||||
msgid "Robot"
|
||||
msgstr "Robot"
|
||||
|
||||
#: plugins/display/subdomains.py:60
|
||||
msgid "Subdomains"
|
||||
msgstr "Sous-domaines"
|
||||
|
||||
#: plugins/display/subdomains.py:64 plugins/display/top_downloads.py:71
|
||||
#: plugins/display/top_downloads.py:97 plugins/display/top_hits.py:71
|
||||
#: plugins/display/top_hits.py:97 plugins/display/top_pages.py:71
|
||||
#: plugins/display/top_pages.py:96
|
||||
msgid "URI"
|
||||
msgstr "URI"
|
||||
|
||||
#: plugins/display/top_downloads.py:71
|
||||
msgid "Hit"
|
||||
msgstr "Hit"
|
||||
@ -426,12 +459,6 @@ msgstr "Hit"
|
||||
msgid "All Downloads"
|
||||
msgstr "Tous les téléchargements"
|
||||
|
||||
#: plugins/display/top_downloads.py:71 plugins/display/top_downloads.py:97
|
||||
#: plugins/display/top_hits.py:71 plugins/display/top_hits.py:97
|
||||
#: plugins/display/top_pages.py:71 plugins/display/top_pages.py:96
|
||||
msgid "URI"
|
||||
msgstr "URI"
|
||||
|
||||
#: plugins/display/top_downloads.py:89
|
||||
msgid "Top Downloads"
|
||||
msgstr "Top Téléchargements"
|
||||
|
@ -24,7 +24,7 @@ import json
|
||||
|
||||
def geoiplookup(ip):
|
||||
http = urllib3.PoolManager()
|
||||
r = http.request('GET', f'https://api.geoiplookup.net/?query={ip}&json=true')
|
||||
r = http.request('GET', f'http://ip-api.com/json/{ip}')
|
||||
|
||||
if r.status != 200:
|
||||
raise Exception(r)
|
||||
|
@ -71,19 +71,14 @@ class IWLADisplayAllVisits(IPlugin):
|
||||
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', ''])
|
||||
|
||||
for super_hit in last_access:
|
||||
address = super_hit['remote_addr']
|
||||
if display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
address = '%s [%s]' % (address, super_hit['remote_ip'])
|
||||
|
||||
row = [
|
||||
address,
|
||||
super_hit['remote_addr'],
|
||||
super_hit['viewed_pages'][0],
|
||||
super_hit['viewed_hits'][0],
|
||||
super_hit['bandwidth'][0],
|
||||
time.asctime(super_hit['last_access'])
|
||||
]
|
||||
table.appendRow(row)
|
||||
table.appendRow(row, super_hit['remote_ip'])
|
||||
page.appendBlock(table)
|
||||
|
||||
display.addPage(page)
|
||||
|
@ -71,15 +71,9 @@ class IWLADisplayAllVisitsEnlight(IPlugin):
|
||||
return
|
||||
|
||||
for (idx, row) in enumerate(block.rows):
|
||||
# Direct IP
|
||||
ip = row[0]
|
||||
if not ip in visitors.keys():
|
||||
# name [IP]
|
||||
ip = self.ip_re.match(row[0])
|
||||
if not ip: continue
|
||||
ip = ip[1]
|
||||
if not ip in visitors.keys():
|
||||
continue
|
||||
if visitors[ip].get('enlight', False) or\
|
||||
visitors[ip].get('filtered', False):
|
||||
remote_ip = block.objects[idx]
|
||||
if remote_ip is None or not remote_ip in visitors.keys(): continue
|
||||
visitor = visitors[remote_ip]
|
||||
if visitor.get('enlight', False) or\
|
||||
visitor.get('filtered', False):
|
||||
block.setCellCSSClass(idx, 0, 'iwla_enlight')
|
||||
|
@ -22,8 +22,6 @@ from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
from display import *
|
||||
|
||||
import awstats_data
|
||||
|
||||
"""
|
||||
Display hook
|
||||
|
||||
@ -50,6 +48,20 @@ Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
browser_icons = {
|
||||
'Android':'android',
|
||||
'Android browser (PDA/Phone browser)':'android',
|
||||
'iPhone':'pdaphone',
|
||||
'IPhone (PDA/Phone browser)':'pdaphone',
|
||||
'Edge':'edge',
|
||||
'Chrome':'chrome',
|
||||
'Safari':'safari',
|
||||
'Firefox':'firefox',
|
||||
'Mozilla':'mozilla',
|
||||
'Internet Explorer':'msie',
|
||||
'Opera':'opera',
|
||||
}
|
||||
|
||||
class IWLADisplayBrowsers(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLADisplayBrowsers, self).__init__(iwla)
|
||||
@ -60,7 +72,6 @@ class IWLADisplayBrowsers(IPlugin):
|
||||
self.icon_path = self.iwla.getConfValue('icon_path', '/')
|
||||
self.max_browsers = self.iwla.getConfValue('max_browsers_displayed', 0)
|
||||
self.create_browsers = self.iwla.getConfValue('create_browsers_page', True)
|
||||
self.icon_names = {v:k for (k, v) in awstats_data.browsers_hashid.items()}
|
||||
|
||||
return True
|
||||
|
||||
@ -81,15 +92,12 @@ class IWLADisplayBrowsers(IPlugin):
|
||||
total_browsers = [0]*3
|
||||
new_list = self.max_browsers and browsers[:self.max_browsers] or browsers
|
||||
for (browser, entrance) in new_list:
|
||||
if browser != 'unknown':
|
||||
try:
|
||||
name = awstats_data.browsers_icons[self.icon_names[browser]]
|
||||
icon = '<img alt="%s icon" src="/%s/browser/%s.png"/>' % (name, self.icon_path, name)
|
||||
except:
|
||||
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
|
||||
if browser in browser_icons.keys():
|
||||
name = browser_icons[browser]
|
||||
icon = f'<img alt="{browser} icon" src="/{self.icon_path}/browser/{name}.png"/>'
|
||||
else:
|
||||
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
|
||||
browser = 'Unknown'
|
||||
icon = f'<img alt="Unknown browser icon" src="/{self.icon_path}/browser/unknown.png"/>'
|
||||
browser = self.iwla._(browser)
|
||||
table.appendRow([icon, browser, entrance])
|
||||
total_browsers[2] += entrance
|
||||
if self.max_browsers:
|
||||
@ -114,15 +122,12 @@ class IWLADisplayBrowsers(IPlugin):
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, ['', self.iwla._(u'Browser'), self.iwla._(u'Entrance')])
|
||||
table.setColsCSSClass(['', '', 'iwla_hit'])
|
||||
for (browser, entrance) in browsers[:10]:
|
||||
if browser != 'unknown':
|
||||
try:
|
||||
name = awstats_data.browsers_icons[self.icon_names[browser]]
|
||||
icon = '<img alt="%s icon" src="/%s/browser/%s.png"/>' % (name, self.icon_path, name)
|
||||
except:
|
||||
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
|
||||
if browser in browser_icons.keys():
|
||||
name = browser_icons[browser]
|
||||
icon = f'<img alt="{browser} icon" src="/{self.icon_path}/browser/{name}.png"/>'
|
||||
else:
|
||||
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
|
||||
browser = self.iwla._(u'Unknown')
|
||||
icon = f'<img alt="Unknown browser icon" src="/{self.icon_path}/browser/unknown.png"/>'
|
||||
browser = self.iwla._(browser)
|
||||
table.appendRow([icon, browser, entrance])
|
||||
total_browsers[2] -= entrance
|
||||
if total_browsers[2]:
|
||||
|
@ -59,7 +59,7 @@ class IWLADisplayFeeds(IPlugin):
|
||||
return True
|
||||
|
||||
def hook(self):
|
||||
from plugins.post_analysis.feeds import IWLAPostAnalysisFeeds
|
||||
from plugins.pre_analysis.feeds import IWLAPostAnalysisFeeds
|
||||
|
||||
display = self.iwla.getDisplay()
|
||||
hits = self.iwla.getCurrentVisits()
|
||||
@ -70,25 +70,37 @@ class IWLADisplayFeeds(IPlugin):
|
||||
title = createCurTitle(self.iwla, self.iwla._(u'All Feeds parsers'))
|
||||
filename = 'all_feeds.html'
|
||||
path = self.iwla.getCurDisplayPath(filename)
|
||||
display_visitor_ip = self.iwla.getConfValue('display_visitor_ip', False)
|
||||
|
||||
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
|
||||
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'All feeds parsers'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Last Access')])
|
||||
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', ''])
|
||||
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'All feeds parsers'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits')
|
||||
, self.iwla._(u'Domain'), self.iwla._(u'Subscribers'), self.iwla._(u'Last Access')])
|
||||
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', '', '', ''])
|
||||
rows = []
|
||||
for super_hit in hits.values():
|
||||
if not super_hit.get('feed_parser', False): continue
|
||||
if super_hit['feed_parser'] == IWLAPostAnalysisFeeds.BAD_FEED_PARSER:
|
||||
if super_hit.get('feed_parser', None) not in (IWLAPostAnalysisFeeds.FEED_PARSER,\
|
||||
IWLAPostAnalysisFeeds.MERGED_FEED_PARSER):
|
||||
continue
|
||||
nb_feeds_parsers += 1
|
||||
address = super_hit['remote_addr']
|
||||
if display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
address = '%s [%s]' % (address, super_hit['remote_ip'])
|
||||
if super_hit['feed_parser'] == IWLAPostAnalysisFeeds.MERGED_FEED_PARSER:
|
||||
address += '*'
|
||||
address += ' *'
|
||||
pages = super_hit['not_viewed_pages'][0] + super_hit['viewed_pages'][0]
|
||||
hits = super_hit['not_viewed_hits'][0] + super_hit['viewed_hits'][0]
|
||||
table.appendRow([address, pages, hits, time.asctime(super_hit['last_access'])])
|
||||
last_access = super_hit.get('feed_parser_last_access', super_hit['last_access'])
|
||||
feed_domain = super_hit.get('feed_domain', '')
|
||||
if feed_domain:
|
||||
link = '<a href=\'https://%s/%s\'>%s</a>' % (feed_domain, super_hit.get('feed_uri', ''), feed_domain)
|
||||
else:
|
||||
link = ''
|
||||
subscribers = super_hit.get('feed_subscribers', '')
|
||||
# Don't overload interface
|
||||
if subscribers <= 1: subscribers = ''
|
||||
row = [address, pages, hits, link, subscribers, time.asctime(last_access),
|
||||
super_hit['remote_ip'], last_access]
|
||||
rows.append(row)
|
||||
rows = sorted(rows, key=lambda t: t[7], reverse=True)
|
||||
for row in rows:
|
||||
table.appendRow(row[:6], row[6])
|
||||
page.appendBlock(table)
|
||||
note = DisplayHTMLRaw(self.iwla, ('<small>*%s</small>' % (self.iwla._(u'Merged feeds parsers'))))
|
||||
page.appendBlock(note)
|
||||
|
@ -74,25 +74,24 @@ class IWLADisplayFilterUsers(IPlugin):
|
||||
path = self.iwla.getCurDisplayPath(filename)
|
||||
|
||||
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
|
||||
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Filtered users'), [self.iwla._(u'Pages'), self.iwla._(u'Last Access'), self.iwla._(u'User Agent'), self.iwla._(u'Referer')])
|
||||
table.setColsCSSClass(['iwla_page', '', '', ''])
|
||||
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Filtered users'), [self.iwla._(u'Pages'), self.iwla._(u'Last Access'), self.iwla._(u'User Agent'), self.iwla._(u'Referer'), self.iwla._(u'Location')])
|
||||
table.setColsCSSClass(['iwla_page', '', '', '', ''])
|
||||
row = 0
|
||||
unknown = self.iwla._('Unknown')
|
||||
for filtered_user in self.filtered_users:
|
||||
ip = filtered_user['remote_ip']
|
||||
ip_title = ip
|
||||
if 'dns_name_replaced' in hits[ip].keys():
|
||||
ip_title = '%s [%s]' % (hits[ip]['remote_addr'], ip)
|
||||
location = filtered_user.get('geo_location', {})
|
||||
if location:
|
||||
city = location.get('city', unknown)
|
||||
country = location.get('countryname', unknown)
|
||||
if not city: city = unknown
|
||||
if not country: country = unknown
|
||||
# At least, one information
|
||||
if city != unknown or country != unknown:
|
||||
ip_title = f'{ip_title}<br/>({city}/{country})'
|
||||
table.appendRow([f'<b>{ip_title}</b>', '', ''])
|
||||
isp = location.get('isp', '')
|
||||
str_location = ''
|
||||
city = location.get('city', unknown)
|
||||
country = location.get('country', unknown)
|
||||
if location.get('city', '') or location.get('country', ''):
|
||||
str_location = f'{city}/{country}'
|
||||
if isp:
|
||||
if str_location: str_location += '<br/>'
|
||||
str_location += isp
|
||||
table.appendRow([f'<b>{ip_title}</b>', '', '', '', ''])
|
||||
table.setCellCSSClass(row, 0, '')
|
||||
for r in hits[ip]['requests'][::-1]:
|
||||
uri = r['extract_request']['extract_uri'].lower()
|
||||
@ -107,7 +106,8 @@ class IWLADisplayFilterUsers(IPlugin):
|
||||
referer = ''
|
||||
uri = "%s%s" % (r.get('server_name', ''),
|
||||
r['extract_request']['extract_uri'])
|
||||
table.appendRow([generateHTMLLink(uri), time.asctime(r['time_decoded']), r['http_user_agent'], referer])
|
||||
table.appendRow([generateHTMLLink(uri), time.asctime(r['time_decoded']), r['http_user_agent'], referer, str_location], filtered_user['remote_ip'])
|
||||
str_location = ''
|
||||
page.appendBlock(table)
|
||||
|
||||
display.addPage(page)
|
||||
@ -123,12 +123,7 @@ class IWLADisplayFilterUsers(IPlugin):
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Last Access')])
|
||||
table.setColsCSSClass(['', '', 'iwla_page', 'iwla_hit'])
|
||||
for filtered_user in self.filtered_users[:10]:
|
||||
ip = filtered_user['remote_ip']
|
||||
if 'dns_name_replaced' in hits[ip].keys():
|
||||
ip_title = '%s [%s]' % (hits[ip]['remote_addr'], ip)
|
||||
else:
|
||||
ip_title = ip
|
||||
table.appendRow([ip_title, filtered_user['viewed_pages'][0], filtered_user['viewed_hits'][0], time.asctime(hits[ip]['last_access'])])
|
||||
table.appendRow([filtered_user['remote_addr'], filtered_user['viewed_pages'][0], filtered_user['viewed_hits'][0], time.asctime(filtered_user['last_access'])], filtered_user['remote_ip'])
|
||||
if len(self.filtered_users) > 10:
|
||||
table.appendRow([self.iwla._(u'Others'), len(self.filtered_users)-10, '', ''])
|
||||
table.setCellCSSClass(table.getNbRows()-1, 0, 'iwla_others')
|
||||
|
@ -64,19 +64,14 @@ class IWLADisplayIPToGeo(IPlugin):
|
||||
return True
|
||||
|
||||
@staticmethod # Needed to have unbound method
|
||||
def FlagFilter(host, self):
|
||||
cc = None
|
||||
host_name = host.split(' ')[0] # hostname or ip
|
||||
if host_name in self.visitors.keys():
|
||||
cc = self.visitors[host_name].get('country_code', None)
|
||||
else:
|
||||
for visitor in self.visitors.values():
|
||||
if visitor['remote_addr'] == host_name:
|
||||
cc = visitor.get('country_code', None)
|
||||
break
|
||||
if not cc or cc == 'ip': return None
|
||||
def FlagFilter(host, remote_ip, self):
|
||||
if remote_ip is None or not remote_ip in self.visitors.keys():
|
||||
return None
|
||||
visitor = self.visitors[remote_ip]
|
||||
cc = visitor.get('country_code', None)
|
||||
if not cc: return None
|
||||
icon = '<img alt="%s flag" src="/%s/flags/%s.png"/>' % (cc, self.icon_path, cc)
|
||||
return '%s %s' % (icon ,host)
|
||||
return '%s %s' % (icon, host)
|
||||
|
||||
def hook(self):
|
||||
display = self.iwla.getDisplay()
|
||||
|
69
plugins/display/ip_type.py
Normal file
@ -0,0 +1,69 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright Grégory Soutadé 2023
|
||||
|
||||
# This file is part of iwla
|
||||
|
||||
# iwla is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# iwla is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
from display import *
|
||||
|
||||
"""
|
||||
Display hook
|
||||
|
||||
Add IPv4/IPv6 statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/ip_type
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
class IWLADisplayIPType(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLADisplayIPType, self).__init__(iwla)
|
||||
self.requires = ['IWLAPostAnalysisIPType']
|
||||
|
||||
def hook(self):
|
||||
display = self.iwla.getDisplay()
|
||||
ip_types = self.iwla.getMonthStats()['ip_type']
|
||||
|
||||
# Subdomains in index
|
||||
title = self.iwla._(u'IP types')
|
||||
|
||||
index = self.iwla.getDisplayIndex()
|
||||
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._('Type'), self.iwla._(u'Entrance')])
|
||||
table.setColsCSSClass(['', 'iwla_hit'])
|
||||
types = sorted(ip_types.items(), key=lambda t: t[0])
|
||||
for (_type, count) in types:
|
||||
table.appendRow([_type, count])
|
||||
table.computeRatio(1)
|
||||
index.appendBlock(table)
|
@ -33,7 +33,6 @@ Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
create_all_robot_bandwidth_page*
|
||||
|
||||
Output files :
|
||||
@ -54,7 +53,6 @@ class IWLADisplayRobotBandwidth(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLADisplayRobotBandwidth, self).__init__(iwla)
|
||||
self.API_VERSION = 1
|
||||
self.display_visitor_ip = self.iwla.getConfValue('display_visitor_ip', False)
|
||||
self.create_all_pages = self.iwla.getConfValue('create_all_robot_bandwidth_page', True)
|
||||
|
||||
def load(self):
|
||||
@ -65,11 +63,22 @@ class IWLADisplayRobotBandwidth(IPlugin):
|
||||
hits = self.iwla.getCurrentVisits()
|
||||
|
||||
bandwidths = []
|
||||
bandwidths_group = {}
|
||||
for (k, super_hit) in hits.items():
|
||||
if not self.iwla.isRobot(super_hit):
|
||||
continue
|
||||
bandwidths.append((super_hit, super_hit['bandwidth'][0]))
|
||||
bandwidths.sort(key=lambda tup: tup[1], reverse=True)
|
||||
address = super_hit.get('robot_name', '') or super_hit['remote_addr']
|
||||
if address in bandwidths_group.keys():
|
||||
group = bandwidths_group[address]
|
||||
if group['last_access'] < super_hit['last_access']:
|
||||
group['last_access'] = super_hit['last_access']
|
||||
group['bandwidth'] += super_hit['bandwidth'][0]
|
||||
else:
|
||||
bandwidths_group[address] = {
|
||||
'last_access':super_hit['last_access'],
|
||||
'bandwidth':super_hit['bandwidth'][0]
|
||||
}
|
||||
|
||||
# All in a page
|
||||
if self.create_all_pages:
|
||||
@ -78,17 +87,14 @@ class IWLADisplayRobotBandwidth(IPlugin):
|
||||
path = self.iwla.getCurDisplayPath(filename)
|
||||
|
||||
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
|
||||
table.setColsCSSClass(['', 'iwla_bandwidth', ''])
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Name'), self.iwla._(u'Last seen')], [1])
|
||||
table.setColsCSSClass(['', 'iwla_bandwidth', '', ''])
|
||||
for (super_hit, bandwidth) in bandwidths:
|
||||
address = super_hit['remote_addr']
|
||||
if self.display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
address = '%s [%s]' % (address, super_hit['remote_ip'])
|
||||
|
||||
row = [
|
||||
address,
|
||||
bandwidth,
|
||||
super_hit.get('robot_name', ''),
|
||||
time.asctime(super_hit['last_access'])
|
||||
]
|
||||
table.appendRow(row)
|
||||
@ -103,19 +109,16 @@ class IWLADisplayRobotBandwidth(IPlugin):
|
||||
|
||||
# Top in index
|
||||
index = self.iwla.getDisplayIndex()
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Robot'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
|
||||
table.setColsCSSClass(['', 'iwla_bandwidth', ''])
|
||||
|
||||
for (super_hit, bandwidth) in bandwidths[:10]:
|
||||
address = super_hit['remote_addr']
|
||||
if self.display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
address = '%s [%s]' % (address, super_hit['remote_ip'])
|
||||
|
||||
_bandwidths_group = dict(sorted(bandwidths_group.items(), key=lambda g: g[1]['bandwidth'], reverse=True))
|
||||
for i, (k, group) in enumerate(_bandwidths_group.items()):
|
||||
if i >= 10: break
|
||||
row = [
|
||||
address,
|
||||
bandwidth,
|
||||
time.asctime(super_hit['last_access'])
|
||||
k,
|
||||
group['bandwidth'],
|
||||
time.asctime(group['last_access'])
|
||||
]
|
||||
table.appendRow(row)
|
||||
index.appendBlock(table)
|
||||
|
70
plugins/display/subdomains.py
Normal file
@ -0,0 +1,70 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright Grégory Soutadé 2023
|
||||
|
||||
# This file is part of iwla
|
||||
|
||||
# iwla is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# iwla is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
from display import *
|
||||
|
||||
"""
|
||||
Display hook
|
||||
|
||||
Add subdomains statistics
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/subdomains
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
class IWLADisplaySubDomains(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLADisplaySubDomains, self).__init__(iwla)
|
||||
self.requires = ['IWLAPostAnalysisSubDomains']
|
||||
|
||||
|
||||
def hook(self):
|
||||
display = self.iwla.getDisplay()
|
||||
subdomains = self.iwla.getMonthStats()['subdomains']
|
||||
|
||||
# Subdomains in index
|
||||
title = self.iwla._(u'Subdomains')
|
||||
|
||||
index = self.iwla.getDisplayIndex()
|
||||
|
||||
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._('URI'), self.iwla._(u'Entrance')])
|
||||
table.setColsCSSClass(['', 'iwla_hit'])
|
||||
subdomains = sorted(subdomains.items(), key=lambda t: t[1], reverse=True)
|
||||
for (uri, count) in subdomains:
|
||||
table.appendRow([uri, count])
|
||||
table.computeRatio(1)
|
||||
index.appendBlock(table)
|
@ -33,7 +33,7 @@ Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
display_visitor_ip*
|
||||
None
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
@ -72,13 +72,8 @@ class IWLADisplayTopVisitors(IPlugin):
|
||||
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Top visitors'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [3])
|
||||
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', ''])
|
||||
for super_hit in top_visitors:
|
||||
address = super_hit['remote_addr']
|
||||
if display_visitor_ip and\
|
||||
super_hit.get('dns_name_replaced', False):
|
||||
address = '%s [%s]' % (address, super_hit['remote_ip'])
|
||||
|
||||
row = [
|
||||
address,
|
||||
super_hit['remote_addr'],
|
||||
super_hit['viewed_pages'][0],
|
||||
super_hit['viewed_hits'][0],
|
||||
super_hit['bandwidth'][0],
|
||||
@ -87,7 +82,7 @@ class IWLADisplayTopVisitors(IPlugin):
|
||||
total[1] -= super_hit['viewed_pages'][0]
|
||||
total[2] -= super_hit['viewed_hits'][0]
|
||||
total[3] -= super_hit['bandwidth'][0]
|
||||
table.appendRow(row)
|
||||
table.appendRow(row, super_hit['remote_ip'])
|
||||
if total[1] or total[2] or total[3]:
|
||||
total[0] = self.iwla._(u'Others')
|
||||
total[4] = ''
|
||||
|
85
plugins/display/visitor_ip.py
Normal file
@ -0,0 +1,85 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright Grégory Soutadé 2023
|
||||
|
||||
# This file is part of iwla
|
||||
|
||||
# iwla is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# iwla is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
|
||||
from ipaddress import ip_address
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
from display import *
|
||||
|
||||
"""
|
||||
Display hook
|
||||
|
||||
Display IP below visitor name
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
compact_ip*
|
||||
|
||||
Output files :
|
||||
OUTPUT_ROOT/year/month/index.html
|
||||
|
||||
Statistics creation :
|
||||
None
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
class IWLADisplayVisitorIP(IPlugin):
|
||||
def load(self):
|
||||
display = self.iwla.getDisplay()
|
||||
display.addColumnFilter(self.iwla._(u'Host'), self.IPFilter, {'self':self})
|
||||
|
||||
self.compact_ip = self.iwla.getConfValue('compact_ip', False)
|
||||
|
||||
return True
|
||||
|
||||
def processIP(self, host_name, ip):
|
||||
host_name = host_name.replace(ip, 'IP')
|
||||
# IPv4
|
||||
ip = ip.replace('.', '-')
|
||||
# IPv6
|
||||
ip = ip.replace(':', '-')
|
||||
host_name = host_name.replace(ip, 'IP')
|
||||
ip = ip.replace('-', '')
|
||||
host_name = host_name.replace(ip, 'IP')
|
||||
|
||||
return host_name
|
||||
|
||||
@staticmethod # Needed to have unbound method
|
||||
def IPFilter(host, remote_ip, self):
|
||||
if remote_ip is None or not remote_ip in self.visitors.keys(): return None
|
||||
visitor = self.visitors[remote_ip]
|
||||
if remote_ip == visitor['remote_addr']: return None
|
||||
host_name = host
|
||||
if self.compact_ip:
|
||||
host_name = self.processIP(host_name, visitor['remote_ip'])
|
||||
host_name = self.processIP(host_name,
|
||||
ip_address(visitor['remote_ip']).exploded)
|
||||
return '%s [%s]' % (host_name, visitor['remote_ip'])
|
||||
|
||||
def hook(self):
|
||||
self.visitors = self.iwla.getCurrentVisits()
|
@ -23,8 +23,6 @@ import re
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
|
||||
import awstats_data
|
||||
|
||||
"""
|
||||
Post analysis hook
|
||||
|
||||
@ -41,7 +39,7 @@ Output files :
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
browser
|
||||
|
||||
month_stats :
|
||||
@ -55,21 +53,41 @@ Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
browser_order = ['android', 'iphone', 'xbox', 'edge', 'opera', 'chrome', 'safari', 'firefox', 'ie', 'mozilla', 'curl', 'wget', 'w3m']
|
||||
|
||||
browser_hashid = {
|
||||
'android':'Android',
|
||||
'iphone':'iPhone',
|
||||
'edge':'Edg',
|
||||
'chrome':['Chrom', 'Chrome'],
|
||||
'safari':'Safari',
|
||||
'firefox':'Firefox',
|
||||
'ie':'MSIE',
|
||||
'mozilla':'Mozilla',
|
||||
'opera':'OPR',
|
||||
'xbox':'Xbox',
|
||||
'curl':'curl',
|
||||
'wget':'Wget',
|
||||
'w3m':'w3m'
|
||||
}
|
||||
|
||||
browser_name = {
|
||||
'android':'Android',
|
||||
'iphone':'iPhone',
|
||||
'edge':'Edge',
|
||||
'chrome':'Chrome',
|
||||
'safari':'Safari',
|
||||
'firefox':'Firefox',
|
||||
'ie':'Internet Explorer',
|
||||
'mozilla':'Mozilla',
|
||||
'opera':'Opera',
|
||||
'xbox':'Xbox',
|
||||
'curl':'Curl',
|
||||
'wget':'Wget',
|
||||
'w3m':'w3m'
|
||||
}
|
||||
|
||||
class IWLAPostAnalysisBrowsers(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLAPostAnalysisBrowsers, self).__init__(iwla)
|
||||
self.API_VERSION = 1
|
||||
|
||||
def load(self):
|
||||
self.browsers = []
|
||||
|
||||
for hashid in awstats_data.browsers:
|
||||
hashid_re = re.compile(r'.*%s.*' % (hashid), re.IGNORECASE)
|
||||
|
||||
if hashid in awstats_data.browsers_hashid.keys():
|
||||
self.browsers.append((hashid_re, awstats_data.browsers_hashid[hashid]))
|
||||
|
||||
return True
|
||||
|
||||
def hook(self):
|
||||
stats = self.iwla.getValidVisitors()
|
||||
@ -81,23 +99,34 @@ class IWLAPostAnalysisBrowsers(IPlugin):
|
||||
|
||||
for (k, super_hit) in stats.items():
|
||||
if not 'browser' in super_hit:
|
||||
for r in super_hit['requests'][::-1]:
|
||||
for r in super_hit['requests']:
|
||||
user_agent = r['http_user_agent']
|
||||
if not user_agent: continue
|
||||
|
||||
browser_name = 'unknown'
|
||||
for (hashid_re, browser) in self.browsers:
|
||||
if hashid_re.match(user_agent):
|
||||
browser_name = browser
|
||||
break
|
||||
super_hit['browser'] = browser_name
|
||||
name = 'Unknown'
|
||||
for browser in browser_order:
|
||||
reference = browser_hashid[browser]
|
||||
if type(reference) == list:
|
||||
for ref in reference:
|
||||
if ref in user_agent:
|
||||
name = browser_name[browser]
|
||||
break
|
||||
if name != 'Unknown':
|
||||
break
|
||||
else:
|
||||
if browser_hashid[browser] in user_agent:
|
||||
name = browser_name[browser]
|
||||
break
|
||||
if name == 'Unknown' and 'Macintosh' in user_agent:
|
||||
name = 'Safari'
|
||||
super_hit['browser'] = name
|
||||
break
|
||||
else:
|
||||
browser_name = super_hit['browser']
|
||||
name = super_hit['browser']
|
||||
|
||||
if not browser_name in browsers_stats.keys():
|
||||
browsers_stats[browser_name] = 1
|
||||
if not name in browsers_stats.keys():
|
||||
browsers_stats[name] = 1
|
||||
else:
|
||||
browsers_stats[browser_name] += 1
|
||||
browsers_stats[name] += 1
|
||||
|
||||
month_stats['browsers'] = browsers_stats
|
||||
|
@ -66,13 +66,13 @@ Output files :
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
filtered
|
||||
geo_location
|
||||
|
||||
Statistics update :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
keep_requests
|
||||
|
||||
Statistics deletion :
|
||||
@ -80,10 +80,6 @@ Statistics deletion :
|
||||
"""
|
||||
|
||||
class IWLAPostAnalysisFilterUsers(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLAPostAnalysisFilterUsers, self).__init__(iwla)
|
||||
self.API_VERSION = 1
|
||||
|
||||
def _check_filter(self, _filter):
|
||||
if len(_filter) != 3:
|
||||
raise Exception('Bad filter ' + ' '.join(_filter))
|
||||
@ -96,7 +92,7 @@ class IWLAPostAnalysisFilterUsers(IPlugin):
|
||||
raise Exception('Bad filter ' + ' '.join(_filter))
|
||||
except Exception as e:
|
||||
if field == 'ip':
|
||||
_filter[0] = 'remote_addr'
|
||||
_filter[0] = 'remote_ip'
|
||||
if operator not in ('=', '==', '!=', 'in', 'match'):
|
||||
raise Exception('Bad filter ' + ' '.join(_filter))
|
||||
if operator == 'match':
|
||||
|
75
plugins/post_analysis/ip_type.py
Normal file
@ -0,0 +1,75 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright Grégory Soutadé 2023
|
||||
|
||||
# This file is part of iwla
|
||||
|
||||
# iwla is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# iwla is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
|
||||
import re
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
|
||||
"""
|
||||
Post analysis hook
|
||||
|
||||
Detect if IP is IPv4 or IPv6
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_ip =>
|
||||
ip_type
|
||||
|
||||
month_stats :
|
||||
ip_type : {4: XXX, 6: XXX}
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
class IWLAPostAnalysisIPType(IPlugin):
|
||||
|
||||
def load(self):
|
||||
self.v4_re = re.compile('([0-9]{1,3}\.){3}[0-9]{1,3}$')
|
||||
return True
|
||||
|
||||
def hook(self):
|
||||
stats = self.iwla.getValidVisitors()
|
||||
month_stats = self.iwla.getMonthStats()
|
||||
|
||||
if month_stats.get('ip_type', None) is None:
|
||||
month_stats['ip_type'] = {4:0, 6:0}
|
||||
|
||||
for (k, super_hit) in stats.items():
|
||||
if super_hit.get('ip_type', None) is None:
|
||||
if self.v4_re.match(super_hit['remote_ip']):
|
||||
_type = 4
|
||||
else:
|
||||
_type = 6
|
||||
super_hit['ip_type'] = _type
|
||||
month_stats['ip_type'][_type] += 1
|
@ -41,7 +41,7 @@ Output files :
|
||||
|
||||
Statistics creation :
|
||||
visits :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
operating_system
|
||||
|
||||
month_stats :
|
||||
|
@ -136,6 +136,7 @@ class IWLAPostAnalysisReferers(IPlugin):
|
||||
for r in super_hit['requests'][::-1]:
|
||||
if not self.iwla.isValidForCurrentAnalysis(r): break
|
||||
if not r['http_referer']: continue
|
||||
if not self.iwla.hasBeenViewed(r): continue
|
||||
|
||||
uri = r['extract_referer']['extract_uri']
|
||||
if self.own_domain_re.match(uri): continue
|
||||
|
73
plugins/post_analysis/subdomains.py
Normal file
@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Copyright Grégory Soutadé 2023
|
||||
|
||||
# This file is part of iwla
|
||||
|
||||
# iwla is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# iwla is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
|
||||
import re
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
|
||||
"""
|
||||
Post analysis hook
|
||||
|
||||
Group top pages by subdomains
|
||||
|
||||
Plugin requirements :
|
||||
post_analysis/top_pages
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
month_stats:
|
||||
subdomains =>
|
||||
domain => count
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
|
||||
Statistics deletion :
|
||||
None
|
||||
"""
|
||||
|
||||
class IWLAPostAnalysisSubDomains(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLAPostAnalysisSubDomains, self).__init__(iwla)
|
||||
self.requires = ['IWLAPostAnalysisTopPages']
|
||||
|
||||
def load(self):
|
||||
self.domain_re = re.compile(r'([^/]*)/.*')
|
||||
return True
|
||||
|
||||
def hook(self):
|
||||
month_stats = self.iwla.getMonthStats()
|
||||
top_pages = month_stats['top_pages']
|
||||
|
||||
subdomains = {}
|
||||
|
||||
for (uri, count) in top_pages.items():
|
||||
domain = self.domain_re.match(uri)
|
||||
if not domain: continue
|
||||
domain = domain.group(1)
|
||||
subdomains[domain] = subdomains.get(domain, 0) + count
|
||||
|
||||
month_stats['subdomains'] = subdomains
|
@ -75,7 +75,7 @@ class IWLAPostAnalysisTopPages(IPlugin):
|
||||
|
||||
uri = r['extract_request']['extract_uri']
|
||||
if self.index_re.match(uri):
|
||||
uri = '/'
|
||||
uri = ''
|
||||
|
||||
uri = "%s%s" % (r.get('server_name', ''), uri)
|
||||
|
||||
|
@ -19,32 +19,40 @@
|
||||
#
|
||||
|
||||
import re
|
||||
import time
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
|
||||
"""
|
||||
Post analysis hook
|
||||
Pre analysis hook
|
||||
|
||||
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
|
||||
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
|
||||
as it must be the same person with a different IP address.
|
||||
|
||||
Warning : When merge_feeds_parsers is activated, last access display date is the more
|
||||
recent date of all merged parsers found
|
||||
|
||||
Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
feeds
|
||||
feeds_referers*
|
||||
feeds_agents*
|
||||
merge_feeds_parsers*
|
||||
|
||||
Output files :
|
||||
None
|
||||
|
||||
Statistics creation :
|
||||
remote_addr =>
|
||||
remote_ip =>
|
||||
feed_parser
|
||||
feed_name_analysed
|
||||
feed_name_analyzed
|
||||
feed_parser_last_access (for merged parser)
|
||||
feed_domain
|
||||
feed_uri
|
||||
feed_subscribers
|
||||
|
||||
Statistics update :
|
||||
None
|
||||
@ -66,9 +74,10 @@ class IWLAPostAnalysisFeeds(IPlugin):
|
||||
|
||||
def load(self):
|
||||
feeds = self.iwla.getConfValue('feeds', [])
|
||||
feeds_referers = self.iwla.getConfValue('feeds_referers', [])
|
||||
feeds_agents = self.iwla.getConfValue('feeds_agents', [])
|
||||
self.merge_feeds_parsers = self.iwla.getConfValue('merge_feeds_parsers', False)
|
||||
_merge_feeds_parsers_list = self.iwla.getConfValue('merge_feeds_parsers_list', [])
|
||||
_no_merge_feeds_parsers_list = self.iwla.getConfValue('no_merge_feeds_parsers_list', [])
|
||||
|
||||
if feeds is None: return False
|
||||
|
||||
@ -84,41 +93,61 @@ class IWLAPostAnalysisFeeds(IPlugin):
|
||||
self.user_agents_re.append(re.compile(r'.*atom.*'))
|
||||
self.user_agents_re.append(re.compile(r'.*feed.*'))
|
||||
|
||||
self.referers_uri = []
|
||||
for f in feeds_referers:
|
||||
self.referers_uri.append(f)
|
||||
for f in feeds_agents:
|
||||
self.user_agents_re.append(re.compile(f))
|
||||
|
||||
self.bad_user_agents_re = []
|
||||
self.bad_user_agents_re.append(re.compile(r'.*feedback.*'))
|
||||
|
||||
self.subscribers_re = re.compile(r'.* ([0-9]+) subscriber.*')
|
||||
|
||||
self.merge_feeds_parsers_list = []
|
||||
for f in _merge_feeds_parsers_list:
|
||||
self.merge_feeds_parsers_list.append(re.compile(f))
|
||||
|
||||
|
||||
self.no_merge_feeds_parsers_list = []
|
||||
for f in _no_merge_feeds_parsers_list:
|
||||
self.no_merge_feeds_parsers_list.append(re.compile(f))
|
||||
|
||||
self.merged_feeds = {}
|
||||
|
||||
return True
|
||||
|
||||
def _appendToMergeCache(self, isFeedParser, key, hit):
|
||||
hit['feed_parser'] = isFeedParser
|
||||
# First time, register into dict
|
||||
if self.merged_feeds.get(key, None) is None:
|
||||
# Merged
|
||||
self.merged_feeds[key] = hit
|
||||
else:
|
||||
elif hit['remote_ip'] != self.merged_feeds[key]['remote_ip']:
|
||||
# Next time
|
||||
# Current must be ignored
|
||||
hit['feed_parser'] = self.NOT_A_FEED_PARSER
|
||||
merged_hit = hit
|
||||
last_access = hit['last_access']
|
||||
# Previous matched hit must be set as merged
|
||||
isFeedParser = self.MERGED_FEED_PARSER
|
||||
hit = self.merged_feeds[key]
|
||||
hit['feed_parser'] = isFeedParser
|
||||
hit['feed_parser'] = self.MERGED_FEED_PARSER
|
||||
hit['viewed_pages'][0] += merged_hit['viewed_pages'][0]
|
||||
hit['viewed_hits'][0] += merged_hit['viewed_hits'][0]
|
||||
hit['not_viewed_pages'][0] += merged_hit['not_viewed_pages'][0]
|
||||
hit['not_viewed_hits'][0] += merged_hit['not_viewed_hits'][0]
|
||||
if hit['last_access'] < merged_hit['last_access']:
|
||||
hit['feed_parser_last_access'] = merged_hit['last_access']
|
||||
else:
|
||||
hit['feed_parser_last_access'] = hit['last_access']
|
||||
|
||||
def mergeFeedsParsers(self, isFeedParser, hit):
|
||||
if isFeedParser:
|
||||
# One hit only match
|
||||
if True or (hit['viewed_hits'][0] + hit['not_viewed_hits'][0]) == 1:
|
||||
for r in self.merge_feeds_parsers_list:
|
||||
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']):
|
||||
#print('hit match %s' % (hit['remote_addr']))
|
||||
self._appendToMergeCache(isFeedParser, r, hit)
|
||||
return
|
||||
if isFeedParser in (self.FEED_PARSER, self.MERGED_FEED_PARSER):
|
||||
for r in self.no_merge_feeds_parsers_list:
|
||||
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']) or r.match(hit['requests'][0]['http_user_agent']):
|
||||
return
|
||||
for r in self.merge_feeds_parsers_list:
|
||||
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']) or r.match(hit['requests'][0]['http_user_agent']):
|
||||
# One group can view multiple different feeds
|
||||
key = r.pattern + hit.get('feed_domain', '') + hit.get('feed_uri', '')
|
||||
self._appendToMergeCache(isFeedParser, key, hit)
|
||||
return
|
||||
#print("No match for %s : %d" % (hit['remote_addr'], hit['viewed_hits'][0] + hit['not_viewed_hits'][0]))
|
||||
# Other cases, look for user agent
|
||||
user_agent = hit['requests'][0]['http_user_agent'].lower()
|
||||
@ -129,52 +158,68 @@ class IWLAPostAnalysisFeeds(IPlugin):
|
||||
for hit in hits.values():
|
||||
isFeedParser = hit.get('feed_parser', None)
|
||||
|
||||
# Register already tagged feed parser in merged_feeds
|
||||
if self.merge_feeds_parsers and\
|
||||
not isFeedParser in (None, self.BAD_FEED_PARSER):
|
||||
self.mergeFeedsParsers(isFeedParser, hit)
|
||||
if isFeedParser == self.NOT_A_FEED_PARSER:
|
||||
continue
|
||||
|
||||
|
||||
# Second time
|
||||
if isFeedParser:
|
||||
if hit['feed_parser'] == self.BAD_FEED_PARSER: continue
|
||||
if not hit.get('feed_name_analysed', False) and\
|
||||
hit.get('dns_name_replaced', False):
|
||||
hit['feed_name_analysed'] = True
|
||||
addr = hit.get('remote_addr', None)
|
||||
for r in self.bad_feeds_re:
|
||||
if r.match(addr):
|
||||
hit['feed_parser'] = self.BAD_FEED_PARSER
|
||||
break
|
||||
# Update last access time
|
||||
if hit['last_access'] > hit.get('feed_parser_last_access', time.gmtime(0)):
|
||||
hit['feed_parser_last_access'] = hit['last_access']
|
||||
|
||||
# Register already tagged feed parser in merged_feeds
|
||||
if self.merge_feeds_parsers:
|
||||
self.mergeFeedsParsers(isFeedParser, hit)
|
||||
continue
|
||||
|
||||
request = hit['requests'][0]
|
||||
isFeedParser = self.NOT_A_FEED_PARSER
|
||||
uri = request['extract_request']['extract_uri'].lower()
|
||||
for regexp in self.feeds_re:
|
||||
if regexp.match(uri):
|
||||
if regexp.match(uri) and self.iwla.hasBeenViewed(request):
|
||||
isFeedParser = self.FEED_PARSER
|
||||
# Robot that views pages -> bot
|
||||
if hit['robot']:
|
||||
if hit['not_viewed_pages'][0]:
|
||||
isFeedParser = self.NOT_A_FEED_PARSER
|
||||
# # Robot that views pages -> bot
|
||||
# if hit['robot']:
|
||||
# if hit['not_viewed_pages'][0]:
|
||||
# isFeedParser = self.NOT_A_FEED_PARSER
|
||||
break
|
||||
|
||||
user_agent = request['http_user_agent'].lower()
|
||||
|
||||
if isFeedParser == self.NOT_A_FEED_PARSER:
|
||||
user_agent = request['http_user_agent'].lower()
|
||||
for regexp in self.user_agents_re:
|
||||
if regexp.match(user_agent):
|
||||
isFeedParser = self.FEED_PARSER
|
||||
break
|
||||
|
||||
if isFeedParser == self.NOT_A_FEED_PARSER and\
|
||||
request.get('extract_referer', False):
|
||||
referer = request['extract_referer']['extract_uri'].lower()
|
||||
for uri in self.referers_uri:
|
||||
if referer == uri:
|
||||
isFeedParser = self.FEED_PARSER
|
||||
|
||||
if isFeedParser == self.FEED_PARSER:
|
||||
for regexp in self.bad_user_agents_re:
|
||||
if regexp.match(user_agent):
|
||||
isFeedParser = self.NOT_A_FEED_PARSER
|
||||
break
|
||||
|
||||
if isFeedParser == self.FEED_PARSER:
|
||||
if not hit.get('dns_name_replaced', False):
|
||||
self.iwla.reverseDNS(hit)
|
||||
|
||||
if not hit.get('feed_name_analyzed', False):
|
||||
hit['feed_name_analyzed'] = True
|
||||
addr = hit.get('remote_addr', None)
|
||||
for r in self.bad_feeds_re:
|
||||
if r.match(addr):
|
||||
isFeedParser = self.NOT_A_FEED_PARSER
|
||||
break
|
||||
|
||||
if isFeedParser == self.FEED_PARSER:
|
||||
hit['feed_domain'] = request['server_name']
|
||||
hit['feed_uri'] = uri
|
||||
hit['feed_subscribers'] = 0
|
||||
|
||||
subscribers = self.subscribers_re.match(user_agent)
|
||||
if subscribers:
|
||||
hit['feed_subscribers'] = int(subscribers.groups()[0])
|
||||
|
||||
hit['robot'] = True
|
||||
hit['feed_parser'] = isFeedParser
|
||||
if self.merge_feeds_parsers:
|
||||
self.mergeFeedsParsers(isFeedParser, hit)
|
||||
else:
|
||||
hit['feed_parser'] = isFeedParser
|
@ -19,12 +19,13 @@
|
||||
#
|
||||
|
||||
import socket
|
||||
import re
|
||||
|
||||
from iwla import IWLA
|
||||
from iplugin import IPlugin
|
||||
|
||||
"""
|
||||
Post analysis hook
|
||||
Pre analysis hook
|
||||
|
||||
Replace IP by reverse DNS names
|
||||
|
||||
@ -32,7 +33,7 @@ Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
reverse_dns_timeout*
|
||||
robot_domains*
|
||||
|
||||
Output files :
|
||||
None
|
||||
@ -51,31 +52,28 @@ Statistics deletion :
|
||||
"""
|
||||
|
||||
class IWLAPostAnalysisReverseDNS(IPlugin):
|
||||
DEFAULT_DNS_TIMEOUT = 0.5
|
||||
|
||||
def __init__(self, iwla):
|
||||
super(IWLAPostAnalysisReverseDNS, self).__init__(iwla)
|
||||
self.API_VERSION = 1
|
||||
|
||||
def load(self):
|
||||
timeout = self.iwla.getConfValue('reverse_dns_timeout',
|
||||
IWLAPostAnalysisReverseDNS.DEFAULT_DNS_TIMEOUT)
|
||||
socket.setdefaulttimeout(timeout)
|
||||
self.robot_domains_re = []
|
||||
robot_domains = self.iwla.getConfValue('robot_domains', [])
|
||||
for domain in robot_domains:
|
||||
self.robot_domains_re.append(re.compile(domain))
|
||||
|
||||
return True
|
||||
|
||||
def hook(self):
|
||||
hits = self.iwla.getCurrentVisits()
|
||||
for (k, hit) in hits.items():
|
||||
if hit.get('dns_analysed', False): continue
|
||||
if not hit.get('feed_parser', False) and\
|
||||
not self.iwla.isValidVisitor(hit):
|
||||
# Do reverse for feed parser even if they're not
|
||||
# valid visitors
|
||||
if hit.get('robot', False) and not hit.get('feed_parser', False):
|
||||
continue
|
||||
try:
|
||||
name, _, _ = socket.gethostbyaddr(k)
|
||||
hit['remote_addr'] = name.lower()
|
||||
hit['dns_name_replaced'] = True
|
||||
except:
|
||||
pass
|
||||
finally:
|
||||
hit['dns_analysed'] = True
|
||||
|
||||
res = self.iwla.reverseDNS(hit)
|
||||
|
||||
for r in self.robot_domains_re:
|
||||
if r.match(hit['remote_addr']):
|
||||
hit['robot'] = True
|
||||
break
|
||||
|
@ -36,7 +36,8 @@ Plugin requirements :
|
||||
None
|
||||
|
||||
Conf values needed :
|
||||
None
|
||||
count_hit_only_visitors
|
||||
no_referrer_domains
|
||||
|
||||
Output files :
|
||||
None
|
||||
@ -55,15 +56,19 @@ Statistics deletion :
|
||||
"""
|
||||
|
||||
class IWLAPreAnalysisRobots(IPlugin):
|
||||
def __init__(self, iwla):
|
||||
super(IWLAPreAnalysisRobots, self).__init__(iwla)
|
||||
self.API_VERSION = 1
|
||||
|
||||
def load(self):
|
||||
self.awstats_robots = list(map(lambda x : re.compile(('.*%s.*') % (x), re.IGNORECASE), awstats_data.robots))
|
||||
self.robot_re = re.compile(r'.*bot.*', re.IGNORECASE)
|
||||
self.crawl_re = re.compile(r'.*crawl.*', re.IGNORECASE)
|
||||
self.compatible_re = []
|
||||
self.compatible_re.append(re.compile(r'.*\(.*compatible; ([^;]+);.*\).*'))
|
||||
self.compatible_re.append(re.compile(r'.*\(.*compatible; (.*)\).*'))
|
||||
self.compatible_re.append(re.compile(r'.*\(([^;]+); \+.*\).*'))
|
||||
self.compatible_re.append(re.compile(r'(.*); \(\+.*\)*'))
|
||||
self.logger = logging.getLogger(self.__class__.__name__)
|
||||
self.one_hit_only = self.iwla.getConfValue('count_hit_only_visitors', False)
|
||||
self.no_referrer_domains = self.iwla.getConfValue('no_referrer_domains', [])
|
||||
|
||||
return True
|
||||
|
||||
@ -73,26 +78,36 @@ class IWLAPreAnalysisRobots(IPlugin):
|
||||
info = inspect.getframeinfo(frame)
|
||||
|
||||
self.logger.debug('%s is a robot (caller %s:%d)' % (k, info.function, info.lineno))
|
||||
super_hit['robot'] = 1
|
||||
super_hit['robot'] = True
|
||||
super_hit['keep_requests'] = False
|
||||
|
||||
agent = super_hit['requests'][0]['http_user_agent']
|
||||
for compatible_re in self.compatible_re:
|
||||
robot_name = compatible_re.match(agent)
|
||||
if robot_name:
|
||||
super_hit['robot_name'] = robot_name[1]
|
||||
break
|
||||
|
||||
# Basic rule to detect robots
|
||||
def hook(self):
|
||||
hits = self.iwla.getCurrentVisits()
|
||||
for (k, super_hit) in hits.items():
|
||||
if super_hit['robot']:
|
||||
self.logger.debug('%s is a robot' % (k))
|
||||
# Already analyzed
|
||||
if super_hit.get('robot', None) in (True, False):
|
||||
if super_hit['robot'] == True:
|
||||
self.logger.debug('%s is a robot' % (k))
|
||||
continue
|
||||
|
||||
if super_hit.get('feed_parser', False):
|
||||
self.logger.debug('%s is feed parser' % (k))
|
||||
continue
|
||||
|
||||
super_hit['robot'] = False
|
||||
isRobot = False
|
||||
referers = 0
|
||||
|
||||
first_page = super_hit['requests'][0]
|
||||
|
||||
|
||||
if self.robot_re.match(first_page['http_user_agent']) or\
|
||||
self.crawl_re.match(first_page['http_user_agent']):
|
||||
self.logger.debug(first_page['http_user_agent'])
|
||||
@ -110,12 +125,18 @@ class IWLAPreAnalysisRobots(IPlugin):
|
||||
continue
|
||||
|
||||
# 1) no pages view --> robot
|
||||
# if not super_hit['viewed_pages'][0]:
|
||||
# super_hit['robot'] = 1
|
||||
# continue
|
||||
if not self.one_hit_only and not super_hit['viewed_pages'][0]:
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
# 2) Less than 1 hit per page
|
||||
if super_hit['viewed_pages'][0] and (super_hit['viewed_hits'][0] < super_hit['viewed_pages'][0]):
|
||||
isRobot = True
|
||||
# 2.5) 1 page, 1 hit
|
||||
elif super_hit['viewed_pages'][0] == 1 and super_hit['viewed_hits'][0] == 1:
|
||||
isRobot = True
|
||||
|
||||
if isRobot:
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
@ -124,30 +145,42 @@ class IWLAPreAnalysisRobots(IPlugin):
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
not_found_pages = 0
|
||||
error_codes = 0
|
||||
not_modified_pages = 0
|
||||
for hit in super_hit['requests']:
|
||||
# 5) /robots.txt read
|
||||
if hit['extract_request']['http_uri'].endswith('/robots.txt'):
|
||||
self._setRobot(k, super_hit)
|
||||
break
|
||||
|
||||
if int(hit['status']) == 404 or int(hit['status']) == 403:
|
||||
not_found_pages += 1
|
||||
# Exception for favicon.png and all apple-*icon*
|
||||
if int(hit['status']) >= 400 and int(hit['status']) <= 499 and\
|
||||
'icon' not in hit['extract_request']['http_uri']:
|
||||
error_codes += 1
|
||||
elif int(hit['status']) in (304,):
|
||||
not_modified_pages += 1
|
||||
|
||||
# 6) Any referer for hits
|
||||
if not hit['is_page'] and hit['http_referer']:
|
||||
if not hit['is_page'] and hit['http_referer'] not in ('', '-'):
|
||||
referers += 1
|
||||
|
||||
if isRobot:
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
# 7) more than 10 404/403 pages
|
||||
if not_found_pages > 10:
|
||||
# 6) Any referer for hits
|
||||
if super_hit['viewed_hits'][0] and not referers and\
|
||||
not super_hit['requests'][0]['server_name'] in self.no_referrer_domains:
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
if not super_hit['viewed_pages'][0] and \
|
||||
(super_hit['viewed_hits'][0] and not referers):
|
||||
# 7) more than 10 4XX or 304 pages
|
||||
if error_codes > 10 or not_modified_pages > 50:
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
# 8) Special case : 1 page and 1 hit, but not from the same source
|
||||
if (super_hit['viewed_pages'][0] == 1 and super_hit['viewed_hits'][0] == 1 and len(super_hit['requests']) == 2) and\
|
||||
(super_hit['requests'][0]['server_name'] != super_hit['requests'][1]['server_name']):
|
||||
self._setRobot(k, super_hit)
|
||||
continue
|
||||
|
||||
|
@ -68,6 +68,7 @@ td:first-child
|
||||
.iwla_search { background : #F4F090; }
|
||||
.iwla_weekend { background : #ECECEC; }
|
||||
.iwla_curday { font-weight: bold; }
|
||||
.iwla_curday > a { font-weight: bold; color:black}
|
||||
.iwla_others { color: #668; }
|
||||
.iwla_update { background : orange; }
|
||||
.iwla_new { background : green }
|
||||
|
Before Width: | Height: | Size: 529 B |
Before Width: | Height: | Size: 216 B |
Before Width: | Height: | Size: 518 B |
Before Width: | Height: | Size: 304 B |
Before Width: | Height: | Size: 235 B |
Before Width: | Height: | Size: 211 B |
Before Width: | Height: | Size: 617 B |
Before Width: | Height: | Size: 275 B |
Before Width: | Height: | Size: 586 B |
Before Width: | Height: | Size: 465 B |
Before Width: | Height: | Size: 595 B |
Before Width: | Height: | Size: 427 B |
Before Width: | Height: | Size: 233 B |
Before Width: | Height: | Size: 551 B |
Before Width: | Height: | Size: 171 B |
Before Width: | Height: | Size: 608 B |
Before Width: | Height: | Size: 625 B |
Before Width: | Height: | Size: 246 B |
Before Width: | Height: | Size: 519 B |
Before Width: | Height: | Size: 588 B |
Before Width: | Height: | Size: 534 B |
Before Width: | Height: | Size: 461 B |
Before Width: | Height: | Size: 632 B |
Before Width: | Height: | Size: 488 B |
Before Width: | Height: | Size: 450 B |
Before Width: | Height: | Size: 404 B |
Before Width: | Height: | Size: 169 B |
Before Width: | Height: | Size: 568 B |
Before Width: | Height: | Size: 592 B |
Before Width: | Height: | Size: 620 B |
Before Width: | Height: | Size: 211 B |
Before Width: | Height: | Size: 169 B |
Before Width: | Height: | Size: 590 B |
Before Width: | Height: | Size: 187 B |
Before Width: | Height: | Size: 164 B |
Before Width: | Height: | Size: 677 B |
Before Width: | Height: | Size: 108 B |
Before Width: | Height: | Size: 209 B |
Before Width: | Height: | Size: 187 B |
Before Width: | Height: | Size: 405 B |
Before Width: | Height: | Size: 403 B |
Before Width: | Height: | Size: 232 B |
Before Width: | Height: | Size: 637 B |
Before Width: | Height: | Size: 224 B |
Before Width: | Height: | Size: 178 B |
Before Width: | Height: | Size: 642 B |
Before Width: | Height: | Size: 476 B |
Before Width: | Height: | Size: 234 B |
Before Width: | Height: | Size: 148 B |
Before Width: | Height: | Size: 480 B |
Before Width: | Height: | Size: 234 B |
Before Width: | Height: | Size: 607 B |
Before Width: | Height: | Size: 653 B |
Before Width: | Height: | Size: 376 B |
Before Width: | Height: | Size: 231 B |
Before Width: | Height: | Size: 599 B |
Before Width: | Height: | Size: 245 B |
Before Width: | Height: | Size: 623 B |
Before Width: | Height: | Size: 285 B |
Before Width: | Height: | Size: 146 B |
Before Width: | Height: | Size: 346 B |
Before Width: | Height: | Size: 558 B |
Before Width: | Height: | Size: 600 B |
Before Width: | Height: | Size: 494 B |
Before Width: | Height: | Size: 233 B |
Before Width: | Height: | Size: 333 B |