Compare commits

...

64 Commits
v0.6 ... master

Author SHA1 Message Date
b2c9879412 Update ChangeLog 2025-02-03 10:06:16 +01:00
8691302741 Update documentation 2025-02-03 10:06:08 +01:00
Gregory Soutade
53e7390b77 Update AWStats data to version 8.0 2025-02-03 09:49:01 +01:00
Gregory Soutade
9b32a81ddb Add "ignore_url" parameter to iwla 2025-02-03 08:04:57 +01:00
Gregory Soutade
7b0ca661a1 Add rule for robot : forbid only "1 page and 1 hit" 2025-02-03 08:00:25 +01:00
Gregory Soutade
4d0b993aec Update default conf 2024-10-27 09:18:04 +01:00
Gregory Soutade
0211596508 Fix potential division by 0 2024-10-27 09:17:53 +01:00
Gregory Soutade
bde91ca936 Move reverse DNS core management into iwla.py + Add robot_domains configuration 2024-10-27 09:16:01 +01:00
Gregory Soutade
70de0d3aca Add no_merge_feeds_parsers_list conf value 2024-10-27 09:15:39 +01:00
Gregory Soutade
9939922c31 Move feeds and reverse_dns plugins from post_analysis to pre_analysis 2024-10-02 08:27:53 +02:00
Gregory Soutade
6d46ac4461 Robots: Improve compatible keyword detection for robots 2024-07-28 09:25:40 +02:00
Gregory Soutade
46c9ae4f15 Feeds: Add domain and number of subscribers for feed parser.
Set correct date for merged feed parsers
Remove bad BAD_FEED_PARSER state
2024-07-28 09:25:06 +02:00
Gregory Soutade
122ee875fa Sanitize requests before analyze 2024-07-28 09:24:52 +02:00
Gregory Soutade
a03b1dfc4f Core: Add multimedia_re filter 2024-07-28 09:24:33 +02:00
c9500e2e99 Update Changelog 2024-03-16 09:08:24 +01:00
ca3c0eefdf Update documentation 2024-03-16 09:02:06 +01:00
1e09852d18 Update locales 2024-03-16 08:53:44 +01:00
Gregory Soutade
db9009bb28 Update AWStats data (v7.9) 2024-03-05 16:41:31 +01:00
Gregory Soutade
e2210f3eab Update geo ip misc plugin 2024-02-15 10:55:59 +01:00
Gregory Soutade
9db72f41fd Don't analyze referer for non viewed hits/pages 2024-02-15 10:55:38 +01:00
Gregory Soutade
0464a3d8e7 Generate HTML part in dry run mode (but don't write it to disk) 2024-02-15 10:55:04 +01:00
Gregory Soutade
b9566beb80 Set lang value in generated HTML page 2024-02-15 10:54:52 +01:00
Gregory Soutade
d78739157b Remove all trailing slashs of URL before starting analyze 2024-02-03 09:02:55 +01:00
Gregory Soutade
d6d216db4d Improve page detection : check if . is present in last part 2024-01-30 11:27:03 +01:00
Gregory Soutade
974d355dd4 Add no_referrer_domains list to defaut_conf for website that defines this policy 2024-01-30 11:24:52 +01:00
Gregory Soutade
f1ffbe40d8 --display-only switch now takes an argument (month/year), analyze is not yet necessary 2023-08-06 13:25:42 +02:00
Gregory Soutade
83275a8db4 Rework filtered_users output to have full location in a column 2023-08-06 13:25:42 +02:00
Gregory Soutade
07eb919837 Add excluded_domain_name to default_conf 2023-07-14 09:24:47 +02:00
Gregory Soutade
16cd817fec Increase not modified page threshold for robot detection 2023-07-05 09:15:48 +02:00
Gregory Soutade
d32b2440ee Bugfix: flags management for feeds display 2023-06-14 09:21:51 +02:00
Gregory Soutade
9d3ff8b3b7 Add excluded domain option 2023-06-14 09:21:11 +02:00
Gregory Soutade
9c688e1545 Display visitor IP is now a filter 2023-05-21 11:06:16 +02:00
Gregory Soutade
7ef0911fa7 Main key for visits is now remote_ip and not remote_addr 2023-05-21 11:04:40 +02:00
Gregory Soutade
7507b8e77f WIP 2023-04-28 16:17:47 +02:00
b1b92412e0 Update documentation 2023-04-18 20:37:33 +02:00
b1e6f973a6 Update locales 2023-04-18 20:37:24 +02:00
Gregory Soutade
de79f526dd Add IP type plugin 2023-04-18 20:34:45 +02:00
Gregory Soutade
4b58048198 Update browsers with msie and Opera 2023-04-18 20:33:09 +02:00
Gregory Soutade
71d8ee2113 Forgot Firefox icon 2023-03-25 08:11:57 +01:00
Gregory Soutade
440f51ddd1 Remove robot rule 1 page for phones 2023-03-23 21:17:52 +01:00
Gregory Soutade
cad3467c25 Remove detection from awstats dataset for browser 2023-03-23 21:16:54 +01:00
Gregory Soutade
44c76007cd Remove .*bot.* and .*crawl.* from awstats_data 2023-03-11 20:56:18 +01:00
Gregory Soutade
adc04bf753 Update iwla :
* Rework arg variable management
  * Manage dry run at top level
  * 'robot' property is now None by default (allow to do analysis only once)
  * Add --disable-display option
2023-03-11 20:51:44 +01:00
Gregory Soutade
6500d98bdd Do not manage dry run inside display part, but directly in iwla 2023-03-11 20:49:28 +01:00
Gregory Soutade
a0a1f42df4 Update robot detection plugin :
* Do analyze only one time by month
  * Reactivate rule : no page view if count_hit_only_visitors is False
  * Add exception for "Less than 1 hit per page" rule if a phone is used
  * Check for all error codes in 400..499, not only 403 and 404
  * Referer '-' now counted as null
2023-03-11 20:48:17 +01:00
Gregory Soutade
31bc67ceba Replace feed referers by feed user agent 2023-03-11 20:42:56 +01:00
Gregory Soutade
3fdbc282c8 Remove feed parser detection by referer 2023-03-11 20:42:37 +01:00
Gregory Soutade
5f96c44edf Set count_hit_only_visitors to False by default 2023-03-11 20:40:31 +01:00
Gregory Soutade
58d31d842a Merge branch 'master' of soutade.fr:iwla 2023-02-18 08:51:15 +01:00
f871f4975c Update translation 2023-02-18 08:51:05 +01:00
Gregory Soutade
16b0619f19 Fix error : total of not viewed bandwidth not displayed 2023-02-18 08:49:27 +01:00
Gregory Soutade
c8dfdd17f7 Add "compatible" as a criteria for robot 2023-02-18 08:49:14 +01:00
Gregory Soutade
a5bef4ece6 Search for "compatible" in all requests, not only the first one 2023-02-18 08:48:57 +01:00
Gregory Soutade
b29765dda9 Update data with AWStats 7.9 2023-02-04 08:42:26 +01:00
Gregory Soutade
cb18cf928e New way to display global statistics : with links in months names instead of "Details" button
Fix Months name not translated in "By Day" corner
2023-02-04 08:40:36 +01:00
Gregory Soutade
21a21cd68f Add a new rule for robots : 1 page and 1 hit, but not from the same source 2023-02-04 08:40:04 +01:00
72db40d593 Update translations 2023-01-28 09:48:25 +01:00
Gregory Soutade
c6ce5cfc6f Increment IWLA version 2023-01-28 09:45:13 +01:00
Gregory Soutade
185664850d Add subdomains plugin 2023-01-28 09:44:43 +01:00
Gregory Soutade
fef9c783f6 Skip redirected pages/hit at analysis level 2023-01-28 09:42:12 +01:00
Gregory Soutade
6a4fd4e9c8 New rule for robot : more than 10 not modified pages in a row 2023-01-28 09:40:26 +01:00
Gregory Soutade
ac246eabe2 Find robot name in 'compatible' string and group them 2023-01-28 09:38:59 +01:00
Gregory Soutade
9c57ad3ece Feeds : display last access date for merged feed parsers 2023-01-28 09:36:48 +01:00
Gregory Soutade
3a8c667fdc Feeds display: Add "*" after a space in order to have flags 2023-01-28 09:35:48 +01:00
135 changed files with 1646 additions and 603 deletions

View File

@ -1,6 +1,52 @@
v0.8 (03/02/2025)
** User **
Add multimedia_re filter to detect multimedia files by regular expression
Add domain and number of subscribers for feed parser
Add "no_merge_feeds_parsers"_list conf value
Add "robot_domains" configuration value
Add rule for robot : forbid only "1 page and 1 hit"
Add "ignore_url" conf value
** Dev **
Sanitize HTTP requests before analyze
Try to detect robots by "compatible" strings
Move feeds and reverse_dns plugins from post_analysis to pre_analysis
Move reverse DNS core management into iwla.py
** Bugs **
Fix potential division by 0
v0.7 (17/03/2024)
** User **
Awstats data updated (7.9)
Improve page/hit detection
--display-only switch now takes an argument (month/year), analyze is not yet necessary
Add --disable-display option
Geo IP plugin updated (use of [ip-api.com](https://ip-api.com/))
Add _subdomains_ plugin
New way to display global statistics : with links in months names instead of "Details" button
Add excluded domain option
** Dev **
Remove detection from awstats dataset for browser
Don't analyze referer for non viewed hits/pages
Remove all trailing slashs of URL before starting analyze
Main key for visits is now "remote\_ip" and not "remote\_addr"
Add IP type plugin to support IPv4 and IPv6
Update robot detection
Display visitor IP is now a filter
Generate HTML part in dry run mode (but don't write it to disk)
Set lang value in generated HTML page
Add no\_referrer\_domains list to defaut_conf for website that defines this policy
Set count\_hit\_only\_visitors to False by default
** Bugs **
Flags management for feeds display
v0.6 (20/11/2022)
** User **
Replace track_users by filter_users plugins which can itnerpret conditional filters from configuration
Replace track_users by filter_users plugins which can interpret conditional filters from configuration
Don't save all visitors requests into database (save space and computing). Can be changed in deufalt_conf.py with keep_requests value
Replace -c argument by config file. Now clean output is -C
Add favicon

File diff suppressed because one or more lines are too long

90
conf.py
View File

@ -1,6 +1,8 @@
#DB_ROOT = './output_db'
#DISPLAY_ROOT = './output_dev'
# Web server log
analyzed_filename = '/var/log/apache2/access.log.1,/var/log/apache2/access.log'
analyzed_filename = '/var/log/apache2/soutade.fr_access.log.1,/var/log/apache2/soutade.fr_access.log'
# Domain name to analyze
domain_name = 'soutade.fr'
@ -10,49 +12,99 @@ display_visitor_ip = True
# Hooks used
pre_analysis_hooks = ['page_to_hit', 'robots']
post_analysis_hooks = ['referers', 'top_pages', 'top_downloads', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'reverse_dns', 'ip_to_geo']
display_hooks = ['filter_users', 'top_visitors', 'all_visits', 'referers', 'top_pages', 'top_downloads', 'referers_diff', 'ip_to_geo', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'top_downloads_diff', 'robot_bandwidth', 'top_pages_diff']
post_analysis_hooks = ['reverse_dns', 'referers', 'top_pages', 'subdomains', 'top_downloads', 'operating_systems', 'browsers', 'hours_stats', 'feeds', 'ip_to_geo', 'filter_users']
display_hooks = ['filter_users', 'top_visitors', 'all_visits', 'referers', 'top_pages', 'subdomains', 'top_downloads', 'referers_diff', 'ip_to_geo', 'operating_systems', 'browsers', 'feeds', 'hours_stats', 'top_downloads_diff', 'robot_bandwidth', 'top_pages_diff', 'all_visits_enlight']
# Reverse DNS timeout
reverse_dns_timeout = 0.2
# Count this addresses as hit
page_to_hit_conf = [r'^.+/logo[/]?$']
page_to_hit_conf = [r'.+/logo[/]?', r'.+/.+\.py']
# Count this addresses as page
hit_to_page_conf = [r'^.+/category/.+$', r'^.+/tag/.+$', r'^.+/archive/.+$', r'^.+/ljdc[/]?$', r'^.+/source/tree/.*$', r'^.+/source/file/.*$', r'^.+/search/.+$']
hit_to_page_conf = [
# Blog
r'.+/category/.+', r'.+/tag/.+', r'.+/archive/.+', r'.+/ljdc[/]?', r'.*/search/.+',
# Indefero
r'.+/source/tree/.*', r'.+/source/file/.*', r'.*/index$',
# Denote
r'.*/edit$', r'.*/add$', r'.+/[0-9]+$', r'.*/preferences$', r'.*/search$', r'.*/public_notes$', r'.*/template.*', r'.*/templates$',
# Music
r'.*/music/.*',
]
# Because it's too long to build HTML when there is too much entries
max_hits_displayed = 100
max_downloads_displayed = 100
# Compressed files
compress_output_files = ['html', 'css', 'js']
# Locale in French
#locale = 'fr'
# Tracked IP
tracked_ip = ['192.168.1.1']
locale = 'fr'
# Filtered IP
filtered_ip = [
# r'192.168.*', # Local
]
filtered_ip = ['82.232.68.211', '78.153.243.190', '176.152.215.133',
'83.199.87.88', # Lanion
'193.136.115.1' # Lisbon
]
import re
# google_re = re.compile('.*google.*')
# duck_re = re.compile('.*duckduckgo.*')
soutade_re = re.compile('.*soutade.fr.*')
def my_filter(iwla, visitor):
# Manage filtered users
if visitor.get('filtered', False): return True
filtered = False
req = visitor['requests'][0]
if visitor.get('country_code', '') == 'fr' and\
req['server_name'] in ('blog.soutade.fr', 'www.soutade.fr', 'soutade.fr') and \
req['extract_request']['extract_uri'] in ('/', '/index.html', '/about.html'):
referer = req['extract_referer']['extract_uri']
if referer in ('', '-'):
# print(f'{req} MATCHED')
filtered = True
elif not soutade_re.match(referer):
# if google_re.match(referer) or duck_re.match(referer):
# print(f'{req} MATCHED')
filtered = True
# Manage enlight users
if visitor.get('enlight', None) is None and not visitor.get('feed_parser', False):
enlight = False
for i, req in enumerate(visitor['requests']):
if i == 0 and req['server_name'] in ('indefero.soutade.fr'): break
if req['server_name'] in ('blog.soutade.fr') and \
req['extract_request']['extract_uri'] in ('/', '/index.html'):
enlight = True
break
visitor['enlight'] = enlight
return filtered
filtered_users = [
# [['country_code', '=', 'cn'], ['viewed_pages', '>=', '100']],
#[['country_code', '=', 'fr'], ['viewed_pages', '>=', '5'], ['viewed_hits', '>=', '5']],
[my_filter],
# [['country_code', '=', 'fr'], my_filter],
]
# Excluded IP
excluded_ip = [
r'192.168.*', # Local
r'117.78.58.*', # China ecs-117-78-58-25.compute.hwclouds-dns.com
#'79.141.15.51', # Elsys
#'165.225.20.107', # ST
#'165.225.76.184', # ST #2
'147.161.180.110', # Schneider
'147.161.182.108', # Schneider 2
'147.161.182.86', # Schneider 3
]
# Feeds url
feeds = [r'/atom.xml', r'/rss.xml']
# Feeds referers url
feeds_referers = ['https://feedly.com']
# Feeds agent url
# feeds_agents = [r'.*feedly.com.*']
merge_feeds_parsers = True
merge_feeds_parsers_list = [r'ec2-.*.compute-1.amazonaws.com']
# Consider xml files as multimedia (append to current list)
multimedia_files_append = ['xml']
@ -62,3 +114,5 @@ count_hit_only_visitors = False
# Not all robots bandwidth (too big)
create_all_robot_bandwidth_page = False
#keep_requests = True

View File

@ -38,12 +38,16 @@ pages_extensions = ['/', 'htm', 'html', 'xhtml', 'py', 'pl', 'rb', 'php']
# HTTP codes that are considered OK
viewed_http_codes = [200, 304]
# URL to ignore
ignore_url = []
# If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
count_hit_only_visitors = True
count_hit_only_visitors = False
# Multimedia extensions (not accounted as downloaded files)
multimedia_files = ['png', 'jpg', 'jpeg', 'gif', 'ico', 'svg',
'css', 'js']
multimedia_files_re = []
# Default resources path (will be symlinked in DISPLAY_OUTPUT)
resources_path = ['resources']
@ -59,7 +63,19 @@ compress_output_files = ['html', 'css', 'js']
locales_path = './locales'
# Default locale (english)
locale = 'en_EN'
locale = 'en'
# Don't keep requests of all visitors into database
keep_requests = False
# Domain names that should be ignored
excluded_domain_name = []
# Domains that set no-referer as Referer-Policy
no_referrer_domains = []
# Domains used by robots
robot_domains = []
# Feeds agent identifier
feeds_agents = [r'.*NextCloud-News']

View File

@ -39,6 +39,9 @@ class DisplayHTMLRaw(object):
self.iwla = iwla
self.html = html
def resetHTML(self):
self.html = ''
def setRawHTML(self, html):
self.html = html
@ -106,10 +109,12 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
self.rows_cssclasses = []
self.table_css = u'iwla_table'
self.human_readable_cols = human_readable_cols or []
def appendRow(self, row):
self.objects = []
def appendRow(self, row, _object=None):
self.rows.append(listToStr(row))
self.rows_cssclasses.append([u''] * len(row))
self.objects.append(_object)
def insertCol(self, col_number, col_title='', col_css_class=''):
self.cols.insert(col_number, col_title)
@ -139,6 +144,12 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
return self.rows[row][col]
def getRowObject(self, row):
if row < 0 or row >= len(self.rows):
raise ValueError('Invalid indices %d' % (row))
return self.objects[row]
def setCellValue(self, row, col, value):
if row < 0 or col < 0 or\
row >= len(self.rows) or col >= len(self.cols):
@ -196,7 +207,7 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
self.insertCol(column_insertion, self.iwla._('Ratio'), u'iwla_hit')
for (index, r) in enumerate(self.rows):
val = r[column] and int(r[column]) or 0
self.setCellValue(index, column_insertion, '%.1f%%' % (float(val*100)/float(total)))
self.setCellValue(index, column_insertion, '%.1f%%' % (total and float(val*100)/float(total) or 0))
def _filter(self, function, column, args):
target_col = None
@ -205,9 +216,9 @@ class DisplayHTMLBlockTable(DisplayHTMLBlock):
target_col = col
break
if target_col is None: return
for row in self.rows:
res = function(row[target_col], **args)
if res:
for idx, row in enumerate(self.rows):
res = function(row[target_col], self.objects[idx], **args)
if res is not None:
row[target_col] = res
def _buildHTML(self):
@ -353,23 +364,21 @@ class DisplayHTMLPage(object):
self.logger.debug('Write %s' % (filename))
if self.iwla.dry_run: return
f = codecs.open(filename, 'w', 'utf-8')
f.write(u'<!DOCTYPE html>')
f.write(u'<html>')
f.write(u'<head>')
f.write(u'<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />')
f.write(u'<link rel="icon" type="image/png" href="/resources/icon/favicon.png"/>')
f.write(u'<!DOCTYPE html>\n')
f.write(u'<html lang="{}">\n'.format(self.iwla.getConfValue('locale', 'en')))
f.write(u'<head>\n')
f.write(u'<meta http-equiv="Content-type" content="text/html; charset=UTF-8"/>\n')
f.write(u'<link rel="icon" type="image/png" href="/resources/icon/favicon.png"/>\n')
for css in self.css_path:
f.write(u'<link rel="stylesheet" href="/%s"/>' % (css))
f.write(u'<link rel="stylesheet" href="/%s"/>\n' % (css))
if self.title:
f.write(u'<title>%s</title>' % (self.title))
f.write(u'</head><body>')
f.write(u'<title>%s</title>\n' % (self.title))
f.write(u'</head><body>\n')
for block in self.blocks:
block.build(f, filters=filters)
if displayVersion:
f.write(u'<div style="text-align:center;width:100%%">Generated by <a href="%s">IWLA %s</a></div>' %
f.write(u'<div style="text-align:center;width:100%%">Generated by <a href="%s">IWLA %s</a></div>\n' %
("http://indefero.soutade.fr/p/iwla", self.iwla.getVersion()))
f.write(u'</body></html>')
f.close()
@ -403,15 +412,14 @@ class DisplayHTMLBuild(object):
self.pages.append(page)
def build(self, root):
if not self.iwla.dry_run:
display_root = self.iwla.getConfValue('DISPLAY_ROOT', '')
if not os.path.exists(display_root):
os.makedirs(display_root)
for res_path in self.iwla.getResourcesPath():
target = os.path.abspath(res_path)
link_name = os.path.join(display_root, res_path)
if not os.path.exists(link_name):
os.symlink(target, link_name)
display_root = self.iwla.getConfValue('DISPLAY_ROOT', '')
if not os.path.exists(display_root):
os.makedirs(display_root)
for res_path in self.iwla.getResourcesPath():
target = os.path.abspath(res_path)
link_name = os.path.join(display_root, res_path)
if not os.path.exists(link_name):
os.symlink(target, link_name)
for page in self.pages:
page.build(root, filters=self.filters)
@ -419,6 +427,21 @@ class DisplayHTMLBuild(object):
def addColumnFilter(self, column, function, args):
self.filters.append(({'column':column, 'args':args}, function))
def getDisplayName(self, visitor):
display_visitor_ip = True
compact_host_name = True
address = visitor['remote_addr']
if display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
host_name = address
if compact_host_name:
ip = visitor['remote_ip'].replace('.', '-')
host_name = host_name.replace(ip, 'IP')
ip = ip.replace('-', '')
host_name = host_name.replace(ip, 'IP')
address = '%s [%s]' % (host_name, visitor['remote_ip'])
return address
#
# Global functions

View File

@ -6,7 +6,7 @@ Introduction
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Nevertheless, iwla is only focused on HTTP logs. It uses data (search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Demo
----
@ -16,8 +16,7 @@ A demonstration instance is available [here](https://iwla-demo.soutade.fr)
Usage
-----
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-P|--disable-display] [-D|--dry-run]
-c : Configuration file to use (default conf.py)
-C : Clean output (database and HTML) before starting
-i : Read data from stdin instead of conf.analyzed_filename
@ -26,6 +25,7 @@ Usage
-r : Reset analysis to a specific date (month/year)
-z : Don't compress databases (bigger but faster, not compatible with compressed databases)
-p : Only generate display
-P : Don't generate display
-d : Dry run (don't write/update files to disk)
Basic usage
@ -48,6 +48,7 @@ You can also append an element to an existing default configuration list by usin
multimedia_files_append = ['xml']
or
multimedia_files_append = 'xml'
Will append 'xml' to current multimedia_files list
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
@ -87,7 +88,7 @@ To use plugins, just insert their file name (without _.py_ extension) in _pre_an
Statistics are stored in dictionaries :
* **month_stats** : Statistics of current analysed month
* **valid_visitor** : A subset of month_stats without robots
* **valid_visitors** : A subset of month_stats without robots
* **days_stats** : Statistics of current analysed day
* **visits** : All visitors with all of its requests (only if 'keep_requests' is true or filtered)
* **meta** : Final result of month statistics (by year)
@ -103,6 +104,7 @@ The two functions to overload are _load(self)_ that must returns True or False i
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
Plugins
=======
@ -116,30 +118,35 @@ Optional configuration values ends with *.
* plugins/display/filter_users.py
* plugins/display/hours_stats.py
* plugins/display/ip_to_geo.py
* plugins/display/ip_type.py
* plugins/display/istats_diff.py
* plugins/display/operating_systems.py
* plugins/display/referers_diff.py
* plugins/display/referers.py
* plugins/display/robot_bandwidth.py
* plugins/display/subdomains.py
* plugins/display/top_downloads_diff.py
* plugins/display/top_downloads.py
* plugins/display/top_hits.py
* plugins/display/top_pages_diff.py
* plugins/display/top_pages.py
* plugins/display/top_visitors.py
* plugins/display/visitor_ip.py
* plugins/post_analysis/anonymize_ip.py
* plugins/post_analysis/browsers.py
* plugins/post_analysis/feeds.py
* plugins/post_analysis/filter_users.py
* plugins/post_analysis/hours_stats.py
* plugins/post_analysis/ip_to_geo.py
* plugins/post_analysis/ip_type.py
* plugins/post_analysis/operating_systems.py
* plugins/post_analysis/referers.py
* plugins/post_analysis/reverse_dns.py
* plugins/post_analysis/subdomains.py
* plugins/post_analysis/top_downloads.py
* plugins/post_analysis/top_hits.py
* plugins/post_analysis/top_pages.py
* plugins/pre_analysis/feeds.py
* plugins/pre_analysis/page_to_hit.py
* plugins/pre_analysis/reverse_dns.py
* plugins/pre_analysis/robots.py
@ -157,8 +164,13 @@ iwla
analyzed_filename
domain_name
locales_path
locale
keep_requests*
compress_output_files
excluded_ip
excluded_domain_name
reverse_dns_timeout*
ignore_url*
Output files :
DB_ROOT/meta.db
@ -199,7 +211,7 @@ iwla
nb_visitors
visits :
remote_addr =>
remote_ip =>
remote_addr
remote_ip
viewed_pages{0..31} # 0 contains total
@ -423,6 +435,32 @@ plugins.display.ip_to_geo
None
plugins.display.ip_type
-----------------------
Display hook
Add IPv4/IPv6 statistics
Plugin requirements :
post_analysis/ip_type
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.istats_diff
---------------------------
@ -543,7 +581,6 @@ plugins.display.robot_bandwidth
None
Conf values needed :
display_visitor_ip*
create_all_robot_bandwidth_page*
Output files :
@ -560,6 +597,32 @@ plugins.display.robot_bandwidth
None
plugins.display.subdomains
--------------------------
Display hook
Add subdomains statistics
Plugin requirements :
post_analysis/subdomains
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_downloads_diff
----------------------------------
@ -707,7 +770,33 @@ plugins.display.top_visitors
None
Conf values needed :
display_visitor_ip*
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.visitor_ip
--------------------------
Display hook
Display IP below visitor name
Plugin requirements :
None
Conf values needed :
compact_ip*
Output files :
OUTPUT_ROOT/year/month/index.html
@ -767,7 +856,7 @@ plugins.post_analysis.browsers
Statistics creation :
visits :
remote_addr =>
remote_ip =>
browser
month_stats :
@ -781,38 +870,6 @@ plugins.post_analysis.browsers
None
plugins.post_analysis.feeds
---------------------------
Post analysis hook
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
as it must be the same person with a different IP address.
Plugin requirements :
None
Conf values needed :
feeds
feeds_referers*
merge_feeds_parsers*
Output files :
None
Statistics creation :
remote_addr =>
feed_parser
feed_name_analysed
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.filter_users
----------------------------------
@ -856,13 +913,13 @@ plugins.post_analysis.filter_users
Statistics creation :
visits :
remote_addr =>
remote_ip =>
filtered
geo_location
Statistics update :
visits :
remote_addr =>
remote_ip =>
keep_requests
Statistics deletion :
@ -936,6 +993,37 @@ plugins.post_analysis.ip_to_geo
None
plugins.post_analysis.ip_type
-----------------------------
Post analysis hook
Detect if IP is IPv4 or IPv6
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
visits :
remote_ip =>
ip_type
month_stats :
ip_type : {4: XXX, 6: XXX}
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.operating_systems
---------------------------------------
@ -954,7 +1042,7 @@ plugins.post_analysis.operating_systems
Statistics creation :
visits :
remote_addr =>
remote_ip =>
operating_system
month_stats :
@ -1008,30 +1096,29 @@ plugins.post_analysis.referers
None
plugins.post_analysis.reverse_dns
---------------------------------
plugins.post_analysis.subdomains
--------------------------------
Post analysis hook
Replace IP by reverse DNS names
Group top pages by subdomains
Plugin requirements :
None
post_analysis/top_pages
Conf values needed :
reverse_dns_timeout*
None
Output files :
None
Statistics creation :
None
month_stats:
subdomains =>
domain => count
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
None
Statistics deletion :
None
@ -1121,6 +1208,45 @@ plugins.post_analysis.top_pages
None
plugins.pre_analysis.feeds
--------------------------
Pre analysis hook
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
as it must be the same person with a different IP address.
Warning : When merge_feeds_parsers is activated, last access display date is the more
recent date of all merged parsers found
Plugin requirements :
None
Conf values needed :
feeds
feeds_agents*
merge_feeds_parsers*
Output files :
None
Statistics creation :
remote_ip =>
feed_parser
feed_name_analyzed
feed_parser_last_access (for merged parser)
feed_domain
feed_uri
feed_subscribers
Statistics update :
None
Statistics deletion :
None
plugins.pre_analysis.page_to_hit
--------------------------------
@ -1149,6 +1275,35 @@ plugins.pre_analysis.page_to_hit
None
plugins.pre_analysis.reverse_dns
--------------------------------
Pre analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
robot_domains*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.pre_analysis.robots
---------------------------
@ -1160,7 +1315,8 @@ plugins.pre_analysis.robots
None
Conf values needed :
None
count_hit_only_visitors
no_referrer_domains
Output files :
None

View File

@ -6,7 +6,7 @@ Introduction
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Nevertheless, iwla is only focused on HTTP logs. It uses data (search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Demo
----
@ -16,8 +16,7 @@ A demonstration instance is available [here](https://iwla-demo.soutade.fr)
Usage
-----
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-P|--disable-display] [-D|--dry-run]
-c : Configuration file to use (default conf.py)
-C : Clean output (database and HTML) before starting
-i : Read data from stdin instead of conf.analyzed_filename
@ -26,6 +25,7 @@ Usage
-r : Reset analysis to a specific date (month/year)
-z : Don't compress databases (bigger but faster, not compatible with compressed databases)
-p : Only generate display
-P : Don't generate display
-d : Dry run (don't write/update files to disk)
Basic usage
@ -48,6 +48,7 @@ You can also append an element to an existing default configuration list by usin
multimedia_files_append = ['xml']
or
multimedia_files_append = 'xml'
Will append 'xml' to current multimedia_files list
Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
@ -87,7 +88,7 @@ To use plugins, just insert their file name (without _.py_ extension) in _pre_an
Statistics are stored in dictionaries :
* **month_stats** : Statistics of current analysed month
* **valid_visitor** : A subset of month_stats without robots
* **valid_visitors** : A subset of month_stats without robots
* **days_stats** : Statistics of current analysed day
* **visits** : All visitors with all of its requests (only if 'keep_requests' is true or filtered)
* **meta** : Final result of month statistics (by year)
@ -103,6 +104,7 @@ The two functions to overload are _load(self)_ that must returns True or False i
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
Plugins
=======

View File

@ -6,30 +6,35 @@
* plugins/display/filter_users.py
* plugins/display/hours_stats.py
* plugins/display/ip_to_geo.py
* plugins/display/ip_type.py
* plugins/display/istats_diff.py
* plugins/display/operating_systems.py
* plugins/display/referers_diff.py
* plugins/display/referers.py
* plugins/display/robot_bandwidth.py
* plugins/display/subdomains.py
* plugins/display/top_downloads_diff.py
* plugins/display/top_downloads.py
* plugins/display/top_hits.py
* plugins/display/top_pages_diff.py
* plugins/display/top_pages.py
* plugins/display/top_visitors.py
* plugins/display/visitor_ip.py
* plugins/post_analysis/anonymize_ip.py
* plugins/post_analysis/browsers.py
* plugins/post_analysis/feeds.py
* plugins/post_analysis/filter_users.py
* plugins/post_analysis/hours_stats.py
* plugins/post_analysis/ip_to_geo.py
* plugins/post_analysis/ip_type.py
* plugins/post_analysis/operating_systems.py
* plugins/post_analysis/referers.py
* plugins/post_analysis/reverse_dns.py
* plugins/post_analysis/subdomains.py
* plugins/post_analysis/top_downloads.py
* plugins/post_analysis/top_hits.py
* plugins/post_analysis/top_pages.py
* plugins/pre_analysis/feeds.py
* plugins/pre_analysis/page_to_hit.py
* plugins/pre_analysis/reverse_dns.py
* plugins/pre_analysis/robots.py
@ -47,8 +52,13 @@ iwla
analyzed_filename
domain_name
locales_path
locale
keep_requests*
compress_output_files
excluded_ip
excluded_domain_name
reverse_dns_timeout*
ignore_url*
Output files :
DB_ROOT/meta.db
@ -89,7 +99,7 @@ iwla
nb_visitors
visits :
remote_addr =>
remote_ip =>
remote_addr
remote_ip
viewed_pages{0..31} # 0 contains total
@ -313,6 +323,32 @@ plugins.display.ip_to_geo
None
plugins.display.ip_type
-----------------------
Display hook
Add IPv4/IPv6 statistics
Plugin requirements :
post_analysis/ip_type
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.istats_diff
---------------------------
@ -433,7 +469,6 @@ plugins.display.robot_bandwidth
None
Conf values needed :
display_visitor_ip*
create_all_robot_bandwidth_page*
Output files :
@ -450,6 +485,32 @@ plugins.display.robot_bandwidth
None
plugins.display.subdomains
--------------------------
Display hook
Add subdomains statistics
Plugin requirements :
post_analysis/subdomains
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_downloads_diff
----------------------------------
@ -597,7 +658,33 @@ plugins.display.top_visitors
None
Conf values needed :
display_visitor_ip*
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.visitor_ip
--------------------------
Display hook
Display IP below visitor name
Plugin requirements :
None
Conf values needed :
compact_ip*
Output files :
OUTPUT_ROOT/year/month/index.html
@ -657,7 +744,7 @@ plugins.post_analysis.browsers
Statistics creation :
visits :
remote_addr =>
remote_ip =>
browser
month_stats :
@ -671,38 +758,6 @@ plugins.post_analysis.browsers
None
plugins.post_analysis.feeds
---------------------------
Post analysis hook
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
as it must be the same person with a different IP address.
Plugin requirements :
None
Conf values needed :
feeds
feeds_referers*
merge_feeds_parsers*
Output files :
None
Statistics creation :
remote_addr =>
feed_parser
feed_name_analysed
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.filter_users
----------------------------------
@ -746,13 +801,13 @@ plugins.post_analysis.filter_users
Statistics creation :
visits :
remote_addr =>
remote_ip =>
filtered
geo_location
Statistics update :
visits :
remote_addr =>
remote_ip =>
keep_requests
Statistics deletion :
@ -826,6 +881,37 @@ plugins.post_analysis.ip_to_geo
None
plugins.post_analysis.ip_type
-----------------------------
Post analysis hook
Detect if IP is IPv4 or IPv6
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
visits :
remote_ip =>
ip_type
month_stats :
ip_type : {4: XXX, 6: XXX}
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.operating_systems
---------------------------------------
@ -844,7 +930,7 @@ plugins.post_analysis.operating_systems
Statistics creation :
visits :
remote_addr =>
remote_ip =>
operating_system
month_stats :
@ -898,30 +984,29 @@ plugins.post_analysis.referers
None
plugins.post_analysis.reverse_dns
---------------------------------
plugins.post_analysis.subdomains
--------------------------------
Post analysis hook
Replace IP by reverse DNS names
Group top pages by subdomains
Plugin requirements :
None
post_analysis/top_pages
Conf values needed :
reverse_dns_timeout*
None
Output files :
None
Statistics creation :
None
month_stats:
subdomains =>
domain => count
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
None
Statistics deletion :
None
@ -1011,6 +1096,45 @@ plugins.post_analysis.top_pages
None
plugins.pre_analysis.feeds
--------------------------
Pre analysis hook
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
as it must be the same person with a different IP address.
Warning : When merge_feeds_parsers is activated, last access display date is the more
recent date of all merged parsers found
Plugin requirements :
None
Conf values needed :
feeds
feeds_agents*
merge_feeds_parsers*
Output files :
None
Statistics creation :
remote_ip =>
feed_parser
feed_name_analyzed
feed_parser_last_access (for merged parser)
feed_domain
feed_uri
feed_subscribers
Statistics update :
None
Statistics deletion :
None
plugins.pre_analysis.page_to_hit
--------------------------------
@ -1039,6 +1163,35 @@ plugins.pre_analysis.page_to_hit
None
plugins.pre_analysis.reverse_dns
--------------------------------
Pre analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
robot_domains*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.pre_analysis.robots
---------------------------
@ -1050,7 +1203,8 @@ plugins.pre_analysis.robots
None
Conf values needed :
None
count_hit_only_visitors
no_referrer_domains
Output files :
None

219
iwla.py
View File

@ -32,6 +32,7 @@ import logging
import gettext
from calendar import monthrange
from datetime import date, datetime
import socket
import default_conf as conf
@ -50,8 +51,13 @@ Conf values needed :
analyzed_filename
domain_name
locales_path
locale
keep_requests*
compress_output_files
excluded_ip
excluded_domain_name
reverse_dns_timeout*
ignore_url*
Output files :
DB_ROOT/meta.db
@ -92,7 +98,7 @@ days_stats :
nb_visitors
visits :
remote_addr =>
remote_ip =>
remote_addr
remote_ip
viewed_pages{0..31} # 0 contains total
@ -132,9 +138,10 @@ class IWLA(object):
ANALYSIS_CLASS = 'HTTP'
API_VERSION = 1
IWLA_VERSION = '0.6'
IWLA_VERSION = '0.8'
DEFAULT_DNS_TIMEOUT = 0.5
def __init__(self, logLevel, dry_run):
def __init__(self, logLevel, args):
self.meta_infos = {}
self.analyse_started = False
self.current_analysis = {}
@ -142,8 +149,11 @@ class IWLA(object):
self.cache_plugins = {}
self.display = DisplayHTMLBuild(self)
self.valid_visitors = None
self.dry_run = dry_run
self.args = args
self.reverse_dns_timeout = self.getConfValue('reverse_dns_timeout',
IWLA.DEFAULT_DNS_TIMEOUT)
self.log_format_extracted = re.sub(r'([^\$?\w])', r'\\\g<1>', conf.log_format)
self.log_format_extracted = re.sub(r'\$(\w+)', '(?P<\g<1>>.+)', self.log_format_extracted)
self.http_request_extracted = re.compile(r'(?P<http_method>\S+) (?P<http_uri>\S+) (?P<http_version>\S+)')
@ -155,13 +165,22 @@ class IWLA(object):
self.excluded_ip = []
for ip in conf.excluded_ip:
self.excluded_ip += [re.compile(ip)]
self.excluded_domain_name = []
for domain_name in conf.excluded_domain_name:
self.excluded_domain_name += [re.compile(domain_name)]
self.ignore_url = []
for url in conf.ignore_url:
self.ignore_url += [re.compile(url)]
self.multimedia_files_re = []
for file_re in conf.multimedia_files_re:
self.multimedia_files_re += [re.compile(file_re)]
self.plugins = [(conf.PRE_HOOK_DIRECTORY , conf.pre_analysis_hooks),
(conf.POST_HOOK_DIRECTORY , conf.post_analysis_hooks),
(conf.DISPLAY_HOOK_DIRECTORY , conf.display_hooks)]
logging.basicConfig(format='%(name)s %(message)s', level=logLevel)
self.logger = logging.getLogger(self.__class__.__name__)
if self.dry_run:
if self.args.dry_run:
self.logger.info('==> Start (DRY RUN)')
else:
self.logger.info('==> Start')
@ -235,6 +254,26 @@ class IWLA(object):
def getCSSPath(self):
return conf.css_path
def reverseDNS(self, hit):
if hit.get('dns_name_replaced', False):
return hit['remote_addr']
try:
timeout = socket.getdefaulttimeout()
if timeout != self.reverse_dns_timeout:
socket.setdefaulttimeout(self.reverse_dns_timeout)
name, _, _ = socket.gethostbyaddr(hit['remote_ip'])
if timeout != self.reverse_dns_timeout:
socket.setdefaulttimeout(timeout)
hit['remote_addr'] = name.lower()
hit['dns_name_replaced'] = True
except socket.herror:
pass
finally:
hit['dns_analysed'] = True
return hit['remote_addr']
def _clearMeta(self):
self.meta_infos = {
'last_time' : None,
@ -256,7 +295,8 @@ class IWLA(object):
return gzip.open(filename, prot)
def _serialize(self, obj, filename):
if self.dry_run: return
if self.args.dry_run: return
self.logger.info("==> Serialize to %s" % (filename))
base = os.path.dirname(filename)
if not os.path.exists(base):
os.makedirs(base)
@ -299,16 +339,25 @@ class IWLA(object):
if request.endswith(e):
self.logger.debug("True")
return True
# No extension -> page
if not '.' in request.split('/')[-1]:
self.logger.debug("True")
return True
self.logger.debug("False")
return False
def isMultimediaFile(self, request):
self.logger.debug("Is multimedia %s" % (request))
def isMultimediaFile(self, uri):
self.logger.debug("Is multimedia %s" % (uri))
for e in conf.multimedia_files:
if request.lower().endswith(e):
if uri.lower().endswith(e):
self.logger.debug("True")
return True
self.logger.debug("False")
for file_re in self.multimedia_files_re:
if file_re.match(uri):
self.logger.debug("Is multimedia re True")
return True
return False
def isValidVisitor(self, hit):
@ -318,21 +367,32 @@ class IWLA(object):
return True
def isRobot(self, hit):
return hit['robot']
# By default robot is None
return hit['robot'] == True
def _appendHit(self, hit):
remote_addr = hit['remote_addr']
if not remote_addr: return
# Redirected page/hit
if int(hit['status']) in (301, 302, 307, 308):
return
remote_ip = hit['remote_ip']
if not remote_ip: return
for ip in self.excluded_ip:
if ip.match(remote_addr):
if ip.match(remote_ip):
return
if not remote_addr in self.current_analysis['visits'].keys():
request = hit['extract_request']
uri = request.get('extract_uri', request['http_uri'])
for url in self.ignore_url:
if url.match(uri):
return
if not remote_ip in self.current_analysis['visits'].keys():
self._createVisitor(hit)
super_hit = self.current_analysis['visits'][remote_addr]
super_hit = self.current_analysis['visits'][remote_ip]
# Don't keep all requests for robots
if not super_hit['robot']:
super_hit['requests'].append(hit)
@ -343,10 +403,6 @@ class IWLA(object):
super_hit['bandwidth'][0] += int(hit['body_bytes_sent'])
super_hit['last_access'] = self.meta_infos['last_time']
request = hit['extract_request']
uri = request.get('extract_uri', request['http_uri'])
hit['is_page'] = self.isPage(uri)
if super_hit['robot'] or\
@ -375,17 +431,18 @@ class IWLA(object):
super_hit['bandwidth'] = {0:0}
super_hit['last_access'] = self.meta_infos['last_time']
super_hit['requests'] = []
super_hit['robot'] = False
super_hit['robot'] = None
super_hit['hit_only'] = 0
def _normalizeURI(self, uri, removeFileSlash=False):
def _normalizeURI(self, uri, removeFileSlash=True):
if uri == '/': return uri
# Remove protocol
uri = self.protocol_re.sub('', uri)
# Remove double /
uri = self.slash_re.sub('/', uri)
if removeFileSlash and uri[-1] == '/':
uri = uri[:-1]
if removeFileSlash:
while len(uri) > 1 and uri[-1] == '/':
uri = uri[:-1]
return uri
def _normalizeParameters(self, parameters):
@ -416,8 +473,11 @@ class IWLA(object):
referer_groups = self.uri_re.match(hit['http_referer'])
if referer_groups:
hit['extract_referer'] = referer_groups.groupdict("")
hit['extract_referer']['extract_uri'] = self._normalizeURI(hit['extract_referer']['extract_uri'], True)
hit['extract_referer']['extract_uri'] = self._normalizeURI(hit['extract_referer']['extract_uri'])
hit['extract_referer']['extract_parameters'] = self._normalizeParameters(hit['extract_referer']['extract_parameters'])
hit['remote_ip'] = hit['remote_addr']
return True
def _decodeTime(self, hit):
@ -454,14 +514,16 @@ class IWLA(object):
link = DisplayHTMLRaw(self, '<iframe src="../_stats.html"></iframe>')
page.appendBlock(link)
months_name = ['', self._('Jan'), self._('Feb'), self._('Mar'), self._('Apr'), self._('May'), self._('June'), self._('Jul'), self._('Aug'), self._('Sep'), self._('Oct'), self._('Nov'), self._('Dec')]
_, nb_month_days = monthrange(cur_time.tm_year, cur_time.tm_mon)
days = self.display.createBlock(DisplayHTMLBlockTableWithGraph, self._('By day'), [self._('Day'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth')], None, nb_month_days, range(1,6), [4, 5])
days.setColsCSSClass(['', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
nb_visits = 0
nb_days = 0
for i in range(1, nb_month_days+1):
day = '%d<br/>%s' % (i, time.strftime('%b', cur_time))
full_day = '%02d %s %d' % (i, time.strftime('%b', cur_time), cur_time.tm_year)
month = months_name[int(time.strftime('%m', cur_time), 10)]
day = '%d<br/>%s' % (i, month)
full_day = '%02d %s %d' % (i, month, cur_time.tm_year)
if i in self.current_analysis['days_stats'].keys():
stats = self.current_analysis['days_stats'][i]
row = [full_day, stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
@ -506,52 +568,40 @@ class IWLA(object):
cur_time = time.localtime()
months_name = ['', self._('Jan'), self._('Feb'), self._('Mar'), self._('Apr'), self._('May'), self._('June'), self._('Jul'), self._('Aug'), self._('Sep'), self._('Oct'), self._('Nov'), self._('Dec')]
title = '%s %d' % (self._('Summary'), year)
cols = [self._('Month'), self._('Visitors'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth'), self._('Details')]
graph_cols=range(1,7)
cols = [self._('Month'), self._('Visitors'), self._('Visits'), self._('Pages'), self._('Hits'), self._('Bandwidth'), self._('Not viewed Bandwidth')]
graph_cols=range(1,6)
months = self.display.createBlock(DisplayHTMLBlockTableWithGraph, title, cols, None, 12, graph_cols, [5, 6])
months.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth', ''])
months_ = self.display.createBlock(DisplayHTMLBlockTableWithGraph, title, cols[:-1], None, 12, graph_cols[:-1], [5, 6])
months_.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
months.setColsCSSClass(['', 'iwla_visitor', 'iwla_visit', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', 'iwla_bandwidth'])
total = [0] * len(cols)
for i in range(1, 13):
month = '%s<br/>%d' % (months_name[i], year)
full_month = '%s %d' % (months_name[i], year)
link_month = '<a target="_top" href="/%d/%02d/index.html">%s</a>' % (year, i, full_month)
if i in month_stats.keys():
stats = month_stats[i]
link = '<a href="%d/%02d/index.html">%s</a>' % (year, i, self._('Details'))
row = [full_month, stats['nb_visitors'], stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
stats['viewed_bandwidth'], stats['not_viewed_bandwidth'], link]
for j in graph_cols:
row = [link_month, stats['nb_visitors'], stats['nb_visits'], stats['viewed_pages'], stats['viewed_hits'],
stats['viewed_bandwidth'], stats['not_viewed_bandwidth']]
for j in range(1,7):
total[j] += row[j]
else:
row = [full_month, 0, 0, 0, 0, 0, 0, '']
row = [full_month, 0, 0, 0, 0, 0, 0]
months.appendRow(row)
viewed_bandwidth = row[5]
not_viewed_bandwidth = row[6]
months.setCellValue(i-1, 5, viewed_bandwidth)
months.setCellValue(i-1, 6, not_viewed_bandwidth)
months.appendShortTitle(month)
months_.appendRow(row[:-1])
months_.setCellValue(i-1, 5, viewed_bandwidth)
months_.setCellValue(i-1, 6, not_viewed_bandwidth)
months_.appendShortTitle(month)
if year == cur_time.tm_year and i == cur_time.tm_mon:
css = months.getCellCSSClass(i-1, 0)
if css: css = '%s %s' % (css, 'iwla_curday')
else: css = 'iwla_curday'
months.setCellCSSClass(i-1, 0, css)
months_.setCellCSSClass(i-1, 0, css)
total[0] = self._('Total')
total[7] = u''
months.appendRow(total)
page.appendBlock(months)
months_.appendRow(total[:-1])
filename = '%d/_stats.html' % (year)
page_ = self.display.createPage(u'', filename, conf.css_path)
page_.appendBlock(months_)
page_.appendBlock(months)
page_.build(conf.DISPLAY_ROOT, False)
months.resetHTML()
def _generateDisplayWholeMonthStats(self):
title = '%s %s' % (self._('Statistics for'), conf.domain_name)
@ -584,7 +634,7 @@ class IWLA(object):
if not os.path.exists(gz_path) or\
os.stat(path).st_mtime > os.stat(gz_path).st_mtime:
if self.dry_run: return
if self.args.dry_run: return
with open(path, 'rb') as f_in, gzip.open(gz_path, 'wb') as f_out:
f_out.write(f_in.read())
@ -598,9 +648,11 @@ class IWLA(object):
break
def _generateDisplay(self):
if self.args.disable_display: return
self._generateDisplayDaysStats()
self._callPlugins(conf.DISPLAY_HOOK_DIRECTORY)
self._generateDisplayWholeMonthStats()
if self.args.dry_run: return
self.display.build(conf.DISPLAY_ROOT)
self._compressFiles(conf.DISPLAY_ROOT)
@ -645,7 +697,7 @@ class IWLA(object):
self._callPlugins(conf.POST_HOOK_DIRECTORY)
if args.display_only:
if self.args.display_only:
if not 'stats' in self.meta_infos.keys():
self.meta_infos['stats'] = {}
self._generateDisplay()
@ -659,7 +711,6 @@ class IWLA(object):
path = self.getDBFilename(cur_time)
self.logger.info("==> Serialize to %s" % (path))
self._serialize(self.current_analysis, path)
# Save month stats
@ -672,7 +723,6 @@ class IWLA(object):
self.meta_infos['stats'][year][month] = duplicated_stats
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
self.logger.info("==> Serialize to %s" % (meta_path))
self._serialize(self.meta_infos, meta_path)
self._generateDisplay()
@ -708,6 +758,11 @@ class IWLA(object):
self.logger.debug("Not in domain %s" % (hit))
return False
for domain_name in self.excluded_domain_name:
if domain_name.match(hit['server_name']):
self.logger.debug("Domain name %s excluded" % (hit['server_name']))
return False
t = self._decodeTime(hit)
cur_time = self.meta_infos['last_time']
@ -772,8 +827,7 @@ class IWLA(object):
if os.path.exists(output_path): shutil.rmtree(output_path)
month += 1
def start(self, _file, args):
self.args = args
def start(self, _file):
self.start_time = datetime.now()
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
@ -799,12 +853,15 @@ class IWLA(object):
for l in _file:
# print "line " + l
groups = self.log_re.match(l)
sanitized = l.replace('<', '')
sanitized = sanitized.replace('>', '')
groups = self.log_re.match(sanitized)
if groups:
self._newHit(groups.groupdict(""))
else:
self.logger.warning("No match for %s" % (l))
self.logger.warning("No match for %s" % (sanitized))
#break
if self.analyse_started:
@ -815,6 +872,32 @@ class IWLA(object):
self.logger.info('==> Analyse not started : nothing new')
def displayOnly(self, start_time):
self.start_time = datetime.now()
meta_path = os.path.join(conf.DB_ROOT, conf.META_FILENAME)
if os.path.exists(meta_path):
self.logger.info('==> Load previous database')
self.meta_infos = self._deserialize(meta_path) or self._clearMeta()
self.meta_infos['last_time'] = time.strptime(start_time, '%m/%Y')
if self.meta_infos['last_time']:
self.logger.info('Last time')
self.logger.info(self.meta_infos['last_time'])
self.current_analysis = self._deserialize(self.getDBFilename(self.meta_infos['last_time'])) or self._clearVisits()
else:
self._clearVisits()
self.meta_infos['start_analysis_time'] = None
self.cache_plugins = preloadPlugins(self.plugins, self)
self.logger.info('==> Analysing log')
self._generateDayStats()
self._generateMonthStats()
class FileIter(object):
def __init__(self, filenames):
self.filenames = [f for f in filenames.split(',') if f]
@ -880,9 +963,13 @@ if __name__ == '__main__':
default=False,
help='Don\'t compress databases (bigger but faster, not compatible with compressed databases)')
parser.add_argument('-p', '--display-only', dest='display_only', action='store_true',
parser.add_argument('-p', '--display-only', dest='display_only',
default='', type=str,
help='Only generate display for a specific date (month/year)')
parser.add_argument('-P', '--disable-display', dest='disable_display', action='store_true',
default=False,
help='Only generate display')
help='Don\'t generate display')
parser.add_argument('-D', '--dry-run', dest='dry_run', action='store_true',
default=False,
@ -920,14 +1007,18 @@ if __name__ == '__main__':
if not isinstance(loglevel, int):
raise ValueError('Invalid log level: %s' % (args.loglevel))
iwla = IWLA(loglevel, args.dry_run)
iwla = IWLA(loglevel, args)
required_conf = ['analyzed_filename', 'domain_name']
if not validConfRequirements(required_conf, iwla, 'Main Conf'):
sys.exit(0)
if args.stdin:
iwla.start(sys.stdin, args)
if args.display_only:
iwla.displayOnly(args.display_only)
else:
filename = args.file or conf.analyzed_filename
iwla.start(FileIter(filename), args)
if args.stdin:
iwla.start(sys.stdin)
else:
filename = args.file or conf.analyzed_filename
iwla.start(FileIter(filename))

Binary file not shown.

View File

@ -5,8 +5,8 @@
msgid ""
msgstr ""
"Project-Id-Version: iwla\n"
"POT-Creation-Date: 2022-11-10 20:07+0100\n"
"PO-Revision-Date: 2022-11-10 20:08+0100\n"
"POT-Creation-Date: 2024-03-16 08:52+0100\n"
"PO-Revision-Date: 2025-02-03 09:57+0100\n"
"Last-Translator: Soutadé <soutade@gmail.com>\n"
"Language-Team: iwla\n"
"Language: fr\n"
@ -15,7 +15,7 @@ msgstr ""
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
"Generated-By: pygettext.py 1.5\n"
"X-Generator: Poedit 3.1.1\n"
"X-Generator: Poedit 3.5\n"
"X-Poedit-SourceCharset: UTF-8\n"
#: display.py:32
@ -38,11 +38,11 @@ msgstr "Juillet"
msgid "March"
msgstr "Mars"
#: display.py:32 iwla.py:503
#: display.py:32 iwla.py:474 iwla.py:526
msgid "June"
msgstr "Juin"
#: display.py:32 iwla.py:503
#: display.py:32 iwla.py:474 iwla.py:526
msgid "May"
msgstr "Mai"
@ -66,179 +66,175 @@ msgstr "Octobre"
msgid "September"
msgstr "Septembre"
#: display.py:196
#: display.py:207
msgid "Ratio"
msgstr "Pourcentage"
#: iwla.py:446
#: iwla.py:467
msgid "Statistics"
msgstr "Statistiques"
#: iwla.py:454 iwla.py:505
#: iwla.py:474 iwla.py:526
msgid "Apr"
msgstr "Avr"
#: iwla.py:474 iwla.py:526
msgid "Aug"
msgstr "Août"
#: iwla.py:474 iwla.py:526
msgid "Dec"
msgstr "Déc"
#: iwla.py:474 iwla.py:526
msgid "Feb"
msgstr "Fév"
#: iwla.py:474 iwla.py:526
msgid "Jan"
msgstr "Jan"
#: iwla.py:474 iwla.py:526
msgid "Jul"
msgstr "Jui"
#: iwla.py:474 iwla.py:526
msgid "Mar"
msgstr "Mars"
#: iwla.py:474 iwla.py:526
msgid "Nov"
msgstr "Nov"
#: iwla.py:474 iwla.py:526
msgid "Oct"
msgstr "Oct"
#: iwla.py:474 iwla.py:526
msgid "Sep"
msgstr "Sep"
#: iwla.py:476 iwla.py:528
msgid "Not viewed Bandwidth"
msgstr "Traffic non vu"
#: iwla.py:454 iwla.py:505
#: iwla.py:476 iwla.py:528
msgid "Visits"
msgstr "Visites"
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
#: plugins/display/feeds.py:76 plugins/display/filter_users.py:77
#: plugins/display/filter_users.py:118 plugins/display/hours_stats.py:73
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:77
#: plugins/display/filter_users.py:123 plugins/display/hours_stats.py:73
#: plugins/display/hours_stats.py:83 plugins/display/referers.py:95
#: plugins/display/referers.py:153 plugins/display/top_visitors.py:72
msgid "Pages"
msgstr "Pages"
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
#: plugins/display/feeds.py:76 plugins/display/filter_users.py:118
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:123
#: plugins/display/hours_stats.py:73 plugins/display/hours_stats.py:83
#: plugins/display/referers.py:95 plugins/display/referers.py:153
#: plugins/display/top_downloads.py:97 plugins/display/top_visitors.py:72
msgid "Hits"
msgstr "Hits"
#: iwla.py:454 iwla.py:505 plugins/display/all_visits.py:70
#: iwla.py:476 iwla.py:528 plugins/display/all_visits.py:70
#: plugins/display/hours_stats.py:73 plugins/display/hours_stats.py:83
#: plugins/display/robot_bandwidth.py:81 plugins/display/robot_bandwidth.py:106
#: plugins/display/robot_bandwidth.py:90 plugins/display/robot_bandwidth.py:112
#: plugins/display/top_visitors.py:72
msgid "Bandwidth"
msgstr "Bande passante"
#: iwla.py:454 plugins/display/hours_stats.py:71
#: iwla.py:476 plugins/display/hours_stats.py:71
msgid "By day"
msgstr "Par jour"
#: iwla.py:454 plugins/display/hours_stats.py:73
#: iwla.py:476 plugins/display/hours_stats.py:73
msgid "Day"
msgstr "Jour"
#: iwla.py:493
#: iwla.py:516
msgid "Average"
msgstr "Moyenne"
#: iwla.py:496 iwla.py:541
#: iwla.py:519 iwla.py:553
msgid "Total"
msgstr "Total"
#: iwla.py:503
msgid "Apr"
msgstr "Avr"
#: iwla.py:503
msgid "Aug"
msgstr "Août"
#: iwla.py:503
msgid "Dec"
msgstr "Déc"
#: iwla.py:503
msgid "Feb"
msgstr "Fév"
#: iwla.py:503
msgid "Jan"
msgstr "Jan"
#: iwla.py:503
msgid "Jul"
msgstr "Jui"
#: iwla.py:503
msgid "Mar"
msgstr "Mars"
#: iwla.py:503
msgid "Nov"
msgstr "Nov"
#: iwla.py:503
msgid "Oct"
msgstr "Oct"
#: iwla.py:503
msgid "Sep"
msgstr "Sep"
#: iwla.py:504
#: iwla.py:527
msgid "Summary"
msgstr "Résumé"
#: iwla.py:505
#: iwla.py:528
msgid "Month"
msgstr "Mois"
#: iwla.py:505 iwla.py:517 plugins/display/feeds.py:101
#: plugins/display/filter_users.py:113 plugins/display/operating_systems.py:90
msgid "Details"
msgstr "Détails"
#: iwla.py:505 plugins/display/ip_to_geo.py:94 plugins/display/ip_to_geo.py:112
#: iwla.py:528 plugins/display/ip_to_geo.py:89 plugins/display/ip_to_geo.py:107
msgid "Visitors"
msgstr "Visiteurs"
#: iwla.py:553
#: iwla.py:564
msgid "Statistics for"
msgstr "Statistiques pour"
#: iwla.py:560
#: iwla.py:571
msgid "Last update"
msgstr "Dernière mise à jour"
#: iwla.py:564
#: iwla.py:575
msgid "Time analysis"
msgstr "Durée de l'analyse"
#: iwla.py:566
#: iwla.py:577
msgid "hours"
msgstr "heures"
#: iwla.py:567
#: iwla.py:578
msgid "minutes"
msgstr "minutes"
#: iwla.py:567
#: iwla.py:578
msgid "seconds"
msgstr "secondes"
#: plugins/display/all_visits.py:70 plugins/display/all_visits.py:92
#: plugins/display/all_visits.py:70 plugins/display/all_visits.py:87
#: plugins/display/all_visits_enlight.py:67
msgid "All visits"
msgstr "Toutes les visites"
#: plugins/display/all_visits.py:70 plugins/display/feeds.py:76
#: plugins/display/filter_users.py:118 plugins/display/ip_to_geo.py:62
#: plugins/display/robot_bandwidth.py:81 plugins/display/robot_bandwidth.py:106
#: plugins/display/top_visitors.py:72
#: plugins/display/all_visits.py:70 plugins/display/feeds.py:75
#: plugins/display/filter_users.py:123 plugins/display/ip_to_geo.py:62
#: plugins/display/robot_bandwidth.py:90 plugins/display/top_visitors.py:72
#: plugins/display/visitor_ip.py:54
msgid "Host"
msgstr "Hôte"
#: plugins/display/all_visits.py:70 plugins/display/robot_bandwidth.py:81
#: plugins/display/robot_bandwidth.py:106 plugins/display/top_visitors.py:72
#: plugins/display/all_visits.py:70 plugins/display/robot_bandwidth.py:90
#: plugins/display/robot_bandwidth.py:112 plugins/display/top_visitors.py:72
msgid "Last seen"
msgstr "Dernière visite"
#: plugins/display/all_visits.py:93 plugins/display/top_visitors.py:72
#: plugins/display/all_visits.py:88 plugins/display/top_visitors.py:72
msgid "Top visitors"
msgstr "Top visiteurs"
#: plugins/display/browsers.py:79
#: plugins/display/browsers.py:92
msgid "Browsers"
msgstr "Navigateurs"
#: plugins/display/browsers.py:79 plugins/display/browsers.py:114
#: plugins/display/browsers.py:92 plugins/display/browsers.py:124
msgid "Browser"
msgstr "Navigateur"
#: plugins/display/browsers.py:79 plugins/display/browsers.py:114
#: plugins/display/operating_systems.py:78
#: plugins/display/operating_systems.py:95 plugins/display/top_hits.py:71
#: plugins/display/top_hits.py:97 plugins/display/top_pages.py:71
#: plugins/display/top_pages.py:96
#: plugins/display/browsers.py:92 plugins/display/browsers.py:124
#: plugins/display/ip_type.py:63 plugins/display/operating_systems.py:78
#: plugins/display/operating_systems.py:95 plugins/display/subdomains.py:64
#: plugins/display/top_hits.py:71 plugins/display/top_hits.py:97
#: plugins/display/top_pages.py:71 plugins/display/top_pages.py:96
msgid "Entrance"
msgstr "Entrées"
#: plugins/display/browsers.py:99 plugins/display/browsers.py:130
#: plugins/display/browsers.py:109 plugins/display/browsers.py:137
#: plugins/display/filter_users.py:128 plugins/display/referers.py:110
#: plugins/display/referers.py:125 plugins/display/referers.py:140
#: plugins/display/referers.py:163 plugins/display/referers.py:174
@ -246,42 +242,52 @@ msgstr "Entrées"
#: plugins/display/top_downloads.py:83 plugins/display/top_downloads.py:103
#: plugins/display/top_hits.py:82 plugins/display/top_hits.py:103
#: plugins/display/top_pages.py:82 plugins/display/top_pages.py:102
#: plugins/display/top_visitors.py:92
#: plugins/display/top_visitors.py:87
msgid "Others"
msgstr "Autres"
#: plugins/display/browsers.py:106
#: plugins/display/browsers.py:116
msgid "Top Browsers"
msgstr "Top Navigateurs"
#: plugins/display/browsers.py:108
#: plugins/display/browsers.py:118
msgid "All Browsers"
msgstr "Tous les navigateurs"
#: plugins/display/browsers.py:125 plugins/display/filter_users.py:80
msgid "Unknown"
msgstr "Inconnu"
#: plugins/display/feeds.py:70
msgid "All Feeds parsers"
msgstr "Tous les agrégateurs"
#: plugins/display/feeds.py:76
#: plugins/display/feeds.py:75
msgid "All feeds parsers"
msgstr "Tous les agrégateurs"
#: plugins/display/feeds.py:94
#: plugins/display/feeds.py:75 plugins/display/filter_users.py:77
#: plugins/display/filter_users.py:123
msgid "Last Access"
msgstr "Dernière visite"
#: plugins/display/feeds.py:93
msgid "Merged feeds parsers"
msgstr "Agrégateurs fusionnés"
#: plugins/display/feeds.py:99
#: plugins/display/feeds.py:98
msgid "Feeds parsers"
msgstr "Agrégateurs"
#: plugins/display/feeds.py:106
#: plugins/display/feeds.py:100 plugins/display/filter_users.py:118
#: plugins/display/operating_systems.py:90
msgid "Details"
msgstr "Détails"
#: plugins/display/feeds.py:105
msgid "Found"
msgstr "Trouvé"
#: plugins/display/filter_users.py:77
msgid "Location"
msgstr "Position"
#: plugins/display/filter_users.py:77
msgid "Referer"
msgstr "Origine"
@ -290,17 +296,17 @@ msgstr "Origine"
msgid "User Agent"
msgstr "Navigateur"
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:111
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:116
msgid "Filtered users"
msgstr "Utilisateurs filtrés"
#: plugins/display/filter_users.py:77 plugins/display/filter_users.py:118
msgid "Last Access"
msgstr "Dernière visite"
#: plugins/display/filter_users.py:80
msgid "Unknown"
msgstr "Inconnu"
#: plugins/display/hours_stats.py:72
msgid "Fri"
msgstr "Jeu"
msgstr "Ven"
#: plugins/display/hours_stats.py:72
msgid "Mon"
@ -334,19 +340,27 @@ msgstr "Par heures"
msgid "Hours"
msgstr "Heures"
#: plugins/display/ip_to_geo.py:94
#: plugins/display/ip_to_geo.py:89
msgid "Country"
msgstr "Pays"
#: plugins/display/ip_to_geo.py:94 plugins/display/ip_to_geo.py:105
#: plugins/display/ip_to_geo.py:112
#: plugins/display/ip_to_geo.py:89 plugins/display/ip_to_geo.py:100
#: plugins/display/ip_to_geo.py:107
msgid "Countries"
msgstr "Pays"
#: plugins/display/ip_to_geo.py:107
#: plugins/display/ip_to_geo.py:102
msgid "All countries"
msgstr "Tous les pays"
#: plugins/display/ip_type.py:59
msgid "IP types"
msgstr "Type d'IP"
#: plugins/display/ip_type.py:63
msgid "Type"
msgstr "Type"
#: plugins/display/operating_systems.py:78
#: plugins/display/operating_systems.py:88
msgid "Operating Systems"
@ -409,14 +423,33 @@ msgstr "Top phrases clé"
msgid "All key phrases"
msgstr "Toutes les phrases clé"
#: plugins/display/robot_bandwidth.py:99
#: plugins/display/robot_bandwidth.py:90
msgid "Name"
msgstr "Nom"
#: plugins/display/robot_bandwidth.py:105
msgid "Robots bandwidth"
msgstr "Bande passante robots"
#: plugins/display/robot_bandwidth.py:101
#: plugins/display/robot_bandwidth.py:107
msgid "All robots bandwidth"
msgstr "Bande passante tous les robots"
#: plugins/display/robot_bandwidth.py:112
msgid "Robot"
msgstr "Robot"
#: plugins/display/subdomains.py:60
msgid "Subdomains"
msgstr "Sous-domaines"
#: plugins/display/subdomains.py:64 plugins/display/top_downloads.py:71
#: plugins/display/top_downloads.py:97 plugins/display/top_hits.py:71
#: plugins/display/top_hits.py:97 plugins/display/top_pages.py:71
#: plugins/display/top_pages.py:96
msgid "URI"
msgstr "URI"
#: plugins/display/top_downloads.py:71
msgid "Hit"
msgstr "Hit"
@ -426,12 +459,6 @@ msgstr "Hit"
msgid "All Downloads"
msgstr "Tous les téléchargements"
#: plugins/display/top_downloads.py:71 plugins/display/top_downloads.py:97
#: plugins/display/top_hits.py:71 plugins/display/top_hits.py:97
#: plugins/display/top_pages.py:71 plugins/display/top_pages.py:96
msgid "URI"
msgstr "URI"
#: plugins/display/top_downloads.py:89
msgid "Top Downloads"
msgstr "Top Téléchargements"

View File

@ -24,7 +24,7 @@ import json
def geoiplookup(ip):
http = urllib3.PoolManager()
r = http.request('GET', f'https://api.geoiplookup.net/?query={ip}&json=true')
r = http.request('GET', f'http://ip-api.com/json/{ip}')
if r.status != 200:
raise Exception(r)

View File

@ -71,19 +71,14 @@ class IWLADisplayAllVisits(IPlugin):
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', ''])
for super_hit in last_access:
address = super_hit['remote_addr']
if display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
address = '%s [%s]' % (address, super_hit['remote_ip'])
row = [
address,
super_hit['remote_addr'],
super_hit['viewed_pages'][0],
super_hit['viewed_hits'][0],
super_hit['bandwidth'][0],
time.asctime(super_hit['last_access'])
]
table.appendRow(row)
table.appendRow(row, super_hit['remote_ip'])
page.appendBlock(table)
display.addPage(page)

View File

@ -71,15 +71,9 @@ class IWLADisplayAllVisitsEnlight(IPlugin):
return
for (idx, row) in enumerate(block.rows):
# Direct IP
ip = row[0]
if not ip in visitors.keys():
# name [IP]
ip = self.ip_re.match(row[0])
if not ip: continue
ip = ip[1]
if not ip in visitors.keys():
continue
if visitors[ip].get('enlight', False) or\
visitors[ip].get('filtered', False):
remote_ip = block.objects[idx]
if remote_ip is None or not remote_ip in visitors.keys(): continue
visitor = visitors[remote_ip]
if visitor.get('enlight', False) or\
visitor.get('filtered', False):
block.setCellCSSClass(idx, 0, 'iwla_enlight')

View File

@ -22,8 +22,6 @@ from iwla import IWLA
from iplugin import IPlugin
from display import *
import awstats_data
"""
Display hook
@ -50,6 +48,20 @@ Statistics deletion :
None
"""
browser_icons = {
'Android':'android',
'Android browser (PDA/Phone browser)':'android',
'iPhone':'pdaphone',
'IPhone (PDA/Phone browser)':'pdaphone',
'Edge':'edge',
'Chrome':'chrome',
'Safari':'safari',
'Firefox':'firefox',
'Mozilla':'mozilla',
'Internet Explorer':'msie',
'Opera':'opera',
}
class IWLADisplayBrowsers(IPlugin):
def __init__(self, iwla):
super(IWLADisplayBrowsers, self).__init__(iwla)
@ -60,7 +72,6 @@ class IWLADisplayBrowsers(IPlugin):
self.icon_path = self.iwla.getConfValue('icon_path', '/')
self.max_browsers = self.iwla.getConfValue('max_browsers_displayed', 0)
self.create_browsers = self.iwla.getConfValue('create_browsers_page', True)
self.icon_names = {v:k for (k, v) in awstats_data.browsers_hashid.items()}
return True
@ -81,15 +92,12 @@ class IWLADisplayBrowsers(IPlugin):
total_browsers = [0]*3
new_list = self.max_browsers and browsers[:self.max_browsers] or browsers
for (browser, entrance) in new_list:
if browser != 'unknown':
try:
name = awstats_data.browsers_icons[self.icon_names[browser]]
icon = '<img alt="%s icon" src="/%s/browser/%s.png"/>' % (name, self.icon_path, name)
except:
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
if browser in browser_icons.keys():
name = browser_icons[browser]
icon = f'<img alt="{browser} icon" src="/{self.icon_path}/browser/{name}.png"/>'
else:
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
browser = 'Unknown'
icon = f'<img alt="Unknown browser icon" src="/{self.icon_path}/browser/unknown.png"/>'
browser = self.iwla._(browser)
table.appendRow([icon, browser, entrance])
total_browsers[2] += entrance
if self.max_browsers:
@ -114,15 +122,12 @@ class IWLADisplayBrowsers(IPlugin):
table = display.createBlock(DisplayHTMLBlockTable, title, ['', self.iwla._(u'Browser'), self.iwla._(u'Entrance')])
table.setColsCSSClass(['', '', 'iwla_hit'])
for (browser, entrance) in browsers[:10]:
if browser != 'unknown':
try:
name = awstats_data.browsers_icons[self.icon_names[browser]]
icon = '<img alt="%s icon" src="/%s/browser/%s.png"/>' % (name, self.icon_path, name)
except:
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
if browser in browser_icons.keys():
name = browser_icons[browser]
icon = f'<img alt="{browser} icon" src="/{self.icon_path}/browser/{name}.png"/>'
else:
icon = '<img alt="Unknown browser icon" src="/%s/browser/unknown.png"/>' % (self.icon_path)
browser = self.iwla._(u'Unknown')
icon = f'<img alt="Unknown browser icon" src="/{self.icon_path}/browser/unknown.png"/>'
browser = self.iwla._(browser)
table.appendRow([icon, browser, entrance])
total_browsers[2] -= entrance
if total_browsers[2]:

View File

@ -59,7 +59,7 @@ class IWLADisplayFeeds(IPlugin):
return True
def hook(self):
from plugins.post_analysis.feeds import IWLAPostAnalysisFeeds
from plugins.pre_analysis.feeds import IWLAPostAnalysisFeeds
display = self.iwla.getDisplay()
hits = self.iwla.getCurrentVisits()
@ -70,25 +70,37 @@ class IWLADisplayFeeds(IPlugin):
title = createCurTitle(self.iwla, self.iwla._(u'All Feeds parsers'))
filename = 'all_feeds.html'
path = self.iwla.getCurDisplayPath(filename)
display_visitor_ip = self.iwla.getConfValue('display_visitor_ip', False)
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'All feeds parsers'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Last Access')])
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', ''])
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'All feeds parsers'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits')
, self.iwla._(u'Domain'), self.iwla._(u'Subscribers'), self.iwla._(u'Last Access')])
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', '', '', ''])
rows = []
for super_hit in hits.values():
if not super_hit.get('feed_parser', False): continue
if super_hit['feed_parser'] == IWLAPostAnalysisFeeds.BAD_FEED_PARSER:
if super_hit.get('feed_parser', None) not in (IWLAPostAnalysisFeeds.FEED_PARSER,\
IWLAPostAnalysisFeeds.MERGED_FEED_PARSER):
continue
nb_feeds_parsers += 1
address = super_hit['remote_addr']
if display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
address = '%s [%s]' % (address, super_hit['remote_ip'])
if super_hit['feed_parser'] == IWLAPostAnalysisFeeds.MERGED_FEED_PARSER:
address += '*'
address += ' *'
pages = super_hit['not_viewed_pages'][0] + super_hit['viewed_pages'][0]
hits = super_hit['not_viewed_hits'][0] + super_hit['viewed_hits'][0]
table.appendRow([address, pages, hits, time.asctime(super_hit['last_access'])])
last_access = super_hit.get('feed_parser_last_access', super_hit['last_access'])
feed_domain = super_hit.get('feed_domain', '')
if feed_domain:
link = '<a href=\'https://%s/%s\'>%s</a>' % (feed_domain, super_hit.get('feed_uri', ''), feed_domain)
else:
link = ''
subscribers = super_hit.get('feed_subscribers', '')
# Don't overload interface
if subscribers <= 1: subscribers = ''
row = [address, pages, hits, link, subscribers, time.asctime(last_access),
super_hit['remote_ip'], last_access]
rows.append(row)
rows = sorted(rows, key=lambda t: t[7], reverse=True)
for row in rows:
table.appendRow(row[:6], row[6])
page.appendBlock(table)
note = DisplayHTMLRaw(self.iwla, ('<small>*%s</small>' % (self.iwla._(u'Merged feeds parsers'))))
page.appendBlock(note)

View File

@ -74,25 +74,24 @@ class IWLADisplayFilterUsers(IPlugin):
path = self.iwla.getCurDisplayPath(filename)
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Filtered users'), [self.iwla._(u'Pages'), self.iwla._(u'Last Access'), self.iwla._(u'User Agent'), self.iwla._(u'Referer')])
table.setColsCSSClass(['iwla_page', '', '', ''])
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Filtered users'), [self.iwla._(u'Pages'), self.iwla._(u'Last Access'), self.iwla._(u'User Agent'), self.iwla._(u'Referer'), self.iwla._(u'Location')])
table.setColsCSSClass(['iwla_page', '', '', '', ''])
row = 0
unknown = self.iwla._('Unknown')
for filtered_user in self.filtered_users:
ip = filtered_user['remote_ip']
ip_title = ip
if 'dns_name_replaced' in hits[ip].keys():
ip_title = '%s [%s]' % (hits[ip]['remote_addr'], ip)
location = filtered_user.get('geo_location', {})
if location:
city = location.get('city', unknown)
country = location.get('countryname', unknown)
if not city: city = unknown
if not country: country = unknown
# At least, one information
if city != unknown or country != unknown:
ip_title = f'{ip_title}<br/>({city}/{country})'
table.appendRow([f'<b>{ip_title}</b>', '', ''])
isp = location.get('isp', '')
str_location = ''
city = location.get('city', unknown)
country = location.get('country', unknown)
if location.get('city', '') or location.get('country', ''):
str_location = f'{city}/{country}'
if isp:
if str_location: str_location += '<br/>'
str_location += isp
table.appendRow([f'<b>{ip_title}</b>', '', '', '', ''])
table.setCellCSSClass(row, 0, '')
for r in hits[ip]['requests'][::-1]:
uri = r['extract_request']['extract_uri'].lower()
@ -107,7 +106,8 @@ class IWLADisplayFilterUsers(IPlugin):
referer = ''
uri = "%s%s" % (r.get('server_name', ''),
r['extract_request']['extract_uri'])
table.appendRow([generateHTMLLink(uri), time.asctime(r['time_decoded']), r['http_user_agent'], referer])
table.appendRow([generateHTMLLink(uri), time.asctime(r['time_decoded']), r['http_user_agent'], referer, str_location], filtered_user['remote_ip'])
str_location = ''
page.appendBlock(table)
display.addPage(page)
@ -123,12 +123,7 @@ class IWLADisplayFilterUsers(IPlugin):
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Last Access')])
table.setColsCSSClass(['', '', 'iwla_page', 'iwla_hit'])
for filtered_user in self.filtered_users[:10]:
ip = filtered_user['remote_ip']
if 'dns_name_replaced' in hits[ip].keys():
ip_title = '%s [%s]' % (hits[ip]['remote_addr'], ip)
else:
ip_title = ip
table.appendRow([ip_title, filtered_user['viewed_pages'][0], filtered_user['viewed_hits'][0], time.asctime(hits[ip]['last_access'])])
table.appendRow([filtered_user['remote_addr'], filtered_user['viewed_pages'][0], filtered_user['viewed_hits'][0], time.asctime(filtered_user['last_access'])], filtered_user['remote_ip'])
if len(self.filtered_users) > 10:
table.appendRow([self.iwla._(u'Others'), len(self.filtered_users)-10, '', ''])
table.setCellCSSClass(table.getNbRows()-1, 0, 'iwla_others')

View File

@ -64,19 +64,14 @@ class IWLADisplayIPToGeo(IPlugin):
return True
@staticmethod # Needed to have unbound method
def FlagFilter(host, self):
cc = None
host_name = host.split(' ')[0] # hostname or ip
if host_name in self.visitors.keys():
cc = self.visitors[host_name].get('country_code', None)
else:
for visitor in self.visitors.values():
if visitor['remote_addr'] == host_name:
cc = visitor.get('country_code', None)
break
if not cc or cc == 'ip': return None
def FlagFilter(host, remote_ip, self):
if remote_ip is None or not remote_ip in self.visitors.keys():
return None
visitor = self.visitors[remote_ip]
cc = visitor.get('country_code', None)
if not cc: return None
icon = '<img alt="%s flag" src="/%s/flags/%s.png"/>' % (cc, self.icon_path, cc)
return '%s %s' % (icon ,host)
return '%s %s' % (icon, host)
def hook(self):
display = self.iwla.getDisplay()

View File

@ -0,0 +1,69 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2023
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
from iwla import IWLA
from iplugin import IPlugin
from display import *
"""
Display hook
Add IPv4/IPv6 statistics
Plugin requirements :
post_analysis/ip_type
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
"""
class IWLADisplayIPType(IPlugin):
def __init__(self, iwla):
super(IWLADisplayIPType, self).__init__(iwla)
self.requires = ['IWLAPostAnalysisIPType']
def hook(self):
display = self.iwla.getDisplay()
ip_types = self.iwla.getMonthStats()['ip_type']
# Subdomains in index
title = self.iwla._(u'IP types')
index = self.iwla.getDisplayIndex()
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._('Type'), self.iwla._(u'Entrance')])
table.setColsCSSClass(['', 'iwla_hit'])
types = sorted(ip_types.items(), key=lambda t: t[0])
for (_type, count) in types:
table.appendRow([_type, count])
table.computeRatio(1)
index.appendBlock(table)

View File

@ -33,7 +33,6 @@ Plugin requirements :
None
Conf values needed :
display_visitor_ip*
create_all_robot_bandwidth_page*
Output files :
@ -54,7 +53,6 @@ class IWLADisplayRobotBandwidth(IPlugin):
def __init__(self, iwla):
super(IWLADisplayRobotBandwidth, self).__init__(iwla)
self.API_VERSION = 1
self.display_visitor_ip = self.iwla.getConfValue('display_visitor_ip', False)
self.create_all_pages = self.iwla.getConfValue('create_all_robot_bandwidth_page', True)
def load(self):
@ -65,11 +63,22 @@ class IWLADisplayRobotBandwidth(IPlugin):
hits = self.iwla.getCurrentVisits()
bandwidths = []
bandwidths_group = {}
for (k, super_hit) in hits.items():
if not self.iwla.isRobot(super_hit):
continue
bandwidths.append((super_hit, super_hit['bandwidth'][0]))
bandwidths.sort(key=lambda tup: tup[1], reverse=True)
address = super_hit.get('robot_name', '') or super_hit['remote_addr']
if address in bandwidths_group.keys():
group = bandwidths_group[address]
if group['last_access'] < super_hit['last_access']:
group['last_access'] = super_hit['last_access']
group['bandwidth'] += super_hit['bandwidth'][0]
else:
bandwidths_group[address] = {
'last_access':super_hit['last_access'],
'bandwidth':super_hit['bandwidth'][0]
}
# All in a page
if self.create_all_pages:
@ -78,17 +87,14 @@ class IWLADisplayRobotBandwidth(IPlugin):
path = self.iwla.getCurDisplayPath(filename)
page = display.createPage(title, path, self.iwla.getConfValue('css_path', []))
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
table.setColsCSSClass(['', 'iwla_bandwidth', ''])
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Name'), self.iwla._(u'Last seen')], [1])
table.setColsCSSClass(['', 'iwla_bandwidth', '', ''])
for (super_hit, bandwidth) in bandwidths:
address = super_hit['remote_addr']
if self.display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
address = '%s [%s]' % (address, super_hit['remote_ip'])
row = [
address,
bandwidth,
super_hit.get('robot_name', ''),
time.asctime(super_hit['last_access'])
]
table.appendRow(row)
@ -103,19 +109,16 @@ class IWLADisplayRobotBandwidth(IPlugin):
# Top in index
index = self.iwla.getDisplayIndex()
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Host'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._(u'Robot'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [1])
table.setColsCSSClass(['', 'iwla_bandwidth', ''])
for (super_hit, bandwidth) in bandwidths[:10]:
address = super_hit['remote_addr']
if self.display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
address = '%s [%s]' % (address, super_hit['remote_ip'])
_bandwidths_group = dict(sorted(bandwidths_group.items(), key=lambda g: g[1]['bandwidth'], reverse=True))
for i, (k, group) in enumerate(_bandwidths_group.items()):
if i >= 10: break
row = [
address,
bandwidth,
time.asctime(super_hit['last_access'])
k,
group['bandwidth'],
time.asctime(group['last_access'])
]
table.appendRow(row)
index.appendBlock(table)

View File

@ -0,0 +1,70 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2023
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
from iwla import IWLA
from iplugin import IPlugin
from display import *
"""
Display hook
Add subdomains statistics
Plugin requirements :
post_analysis/subdomains
Conf values needed :
None
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
"""
class IWLADisplaySubDomains(IPlugin):
def __init__(self, iwla):
super(IWLADisplaySubDomains, self).__init__(iwla)
self.requires = ['IWLAPostAnalysisSubDomains']
def hook(self):
display = self.iwla.getDisplay()
subdomains = self.iwla.getMonthStats()['subdomains']
# Subdomains in index
title = self.iwla._(u'Subdomains')
index = self.iwla.getDisplayIndex()
table = display.createBlock(DisplayHTMLBlockTable, title, [self.iwla._('URI'), self.iwla._(u'Entrance')])
table.setColsCSSClass(['', 'iwla_hit'])
subdomains = sorted(subdomains.items(), key=lambda t: t[1], reverse=True)
for (uri, count) in subdomains:
table.appendRow([uri, count])
table.computeRatio(1)
index.appendBlock(table)

View File

@ -33,7 +33,7 @@ Plugin requirements :
None
Conf values needed :
display_visitor_ip*
None
Output files :
OUTPUT_ROOT/year/month/index.html
@ -72,13 +72,8 @@ class IWLADisplayTopVisitors(IPlugin):
table = display.createBlock(DisplayHTMLBlockTable, self.iwla._(u'Top visitors'), [self.iwla._(u'Host'), self.iwla._(u'Pages'), self.iwla._(u'Hits'), self.iwla._(u'Bandwidth'), self.iwla._(u'Last seen')], [3])
table.setColsCSSClass(['', 'iwla_page', 'iwla_hit', 'iwla_bandwidth', ''])
for super_hit in top_visitors:
address = super_hit['remote_addr']
if display_visitor_ip and\
super_hit.get('dns_name_replaced', False):
address = '%s [%s]' % (address, super_hit['remote_ip'])
row = [
address,
super_hit['remote_addr'],
super_hit['viewed_pages'][0],
super_hit['viewed_hits'][0],
super_hit['bandwidth'][0],
@ -87,7 +82,7 @@ class IWLADisplayTopVisitors(IPlugin):
total[1] -= super_hit['viewed_pages'][0]
total[2] -= super_hit['viewed_hits'][0]
total[3] -= super_hit['bandwidth'][0]
table.appendRow(row)
table.appendRow(row, super_hit['remote_ip'])
if total[1] or total[2] or total[3]:
total[0] = self.iwla._(u'Others')
total[4] = ''

View File

@ -0,0 +1,85 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2023
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
from ipaddress import ip_address
from iwla import IWLA
from iplugin import IPlugin
from display import *
"""
Display hook
Display IP below visitor name
Plugin requirements :
None
Conf values needed :
compact_ip*
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
"""
class IWLADisplayVisitorIP(IPlugin):
def load(self):
display = self.iwla.getDisplay()
display.addColumnFilter(self.iwla._(u'Host'), self.IPFilter, {'self':self})
self.compact_ip = self.iwla.getConfValue('compact_ip', False)
return True
def processIP(self, host_name, ip):
host_name = host_name.replace(ip, 'IP')
# IPv4
ip = ip.replace('.', '-')
# IPv6
ip = ip.replace(':', '-')
host_name = host_name.replace(ip, 'IP')
ip = ip.replace('-', '')
host_name = host_name.replace(ip, 'IP')
return host_name
@staticmethod # Needed to have unbound method
def IPFilter(host, remote_ip, self):
if remote_ip is None or not remote_ip in self.visitors.keys(): return None
visitor = self.visitors[remote_ip]
if remote_ip == visitor['remote_addr']: return None
host_name = host
if self.compact_ip:
host_name = self.processIP(host_name, visitor['remote_ip'])
host_name = self.processIP(host_name,
ip_address(visitor['remote_ip']).exploded)
return '%s [%s]' % (host_name, visitor['remote_ip'])
def hook(self):
self.visitors = self.iwla.getCurrentVisits()

View File

@ -23,8 +23,6 @@ import re
from iwla import IWLA
from iplugin import IPlugin
import awstats_data
"""
Post analysis hook
@ -41,7 +39,7 @@ Output files :
Statistics creation :
visits :
remote_addr =>
remote_ip =>
browser
month_stats :
@ -55,21 +53,41 @@ Statistics deletion :
None
"""
browser_order = ['android', 'iphone', 'xbox', 'edge', 'opera', 'chrome', 'safari', 'firefox', 'ie', 'mozilla', 'curl', 'wget', 'w3m']
browser_hashid = {
'android':'Android',
'iphone':'iPhone',
'edge':'Edg',
'chrome':['Chrom', 'Chrome'],
'safari':'Safari',
'firefox':'Firefox',
'ie':'MSIE',
'mozilla':'Mozilla',
'opera':'OPR',
'xbox':'Xbox',
'curl':'curl',
'wget':'Wget',
'w3m':'w3m'
}
browser_name = {
'android':'Android',
'iphone':'iPhone',
'edge':'Edge',
'chrome':'Chrome',
'safari':'Safari',
'firefox':'Firefox',
'ie':'Internet Explorer',
'mozilla':'Mozilla',
'opera':'Opera',
'xbox':'Xbox',
'curl':'Curl',
'wget':'Wget',
'w3m':'w3m'
}
class IWLAPostAnalysisBrowsers(IPlugin):
def __init__(self, iwla):
super(IWLAPostAnalysisBrowsers, self).__init__(iwla)
self.API_VERSION = 1
def load(self):
self.browsers = []
for hashid in awstats_data.browsers:
hashid_re = re.compile(r'.*%s.*' % (hashid), re.IGNORECASE)
if hashid in awstats_data.browsers_hashid.keys():
self.browsers.append((hashid_re, awstats_data.browsers_hashid[hashid]))
return True
def hook(self):
stats = self.iwla.getValidVisitors()
@ -81,23 +99,34 @@ class IWLAPostAnalysisBrowsers(IPlugin):
for (k, super_hit) in stats.items():
if not 'browser' in super_hit:
for r in super_hit['requests'][::-1]:
for r in super_hit['requests']:
user_agent = r['http_user_agent']
if not user_agent: continue
browser_name = 'unknown'
for (hashid_re, browser) in self.browsers:
if hashid_re.match(user_agent):
browser_name = browser
break
super_hit['browser'] = browser_name
name = 'Unknown'
for browser in browser_order:
reference = browser_hashid[browser]
if type(reference) == list:
for ref in reference:
if ref in user_agent:
name = browser_name[browser]
break
if name != 'Unknown':
break
else:
if browser_hashid[browser] in user_agent:
name = browser_name[browser]
break
if name == 'Unknown' and 'Macintosh' in user_agent:
name = 'Safari'
super_hit['browser'] = name
break
else:
browser_name = super_hit['browser']
name = super_hit['browser']
if not browser_name in browsers_stats.keys():
browsers_stats[browser_name] = 1
if not name in browsers_stats.keys():
browsers_stats[name] = 1
else:
browsers_stats[browser_name] += 1
browsers_stats[name] += 1
month_stats['browsers'] = browsers_stats

View File

@ -66,13 +66,13 @@ Output files :
Statistics creation :
visits :
remote_addr =>
remote_ip =>
filtered
geo_location
Statistics update :
visits :
remote_addr =>
remote_ip =>
keep_requests
Statistics deletion :
@ -80,10 +80,6 @@ Statistics deletion :
"""
class IWLAPostAnalysisFilterUsers(IPlugin):
def __init__(self, iwla):
super(IWLAPostAnalysisFilterUsers, self).__init__(iwla)
self.API_VERSION = 1
def _check_filter(self, _filter):
if len(_filter) != 3:
raise Exception('Bad filter ' + ' '.join(_filter))
@ -96,7 +92,7 @@ class IWLAPostAnalysisFilterUsers(IPlugin):
raise Exception('Bad filter ' + ' '.join(_filter))
except Exception as e:
if field == 'ip':
_filter[0] = 'remote_addr'
_filter[0] = 'remote_ip'
if operator not in ('=', '==', '!=', 'in', 'match'):
raise Exception('Bad filter ' + ' '.join(_filter))
if operator == 'match':

View File

@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2023
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
import re
from iwla import IWLA
from iplugin import IPlugin
"""
Post analysis hook
Detect if IP is IPv4 or IPv6
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
visits :
remote_ip =>
ip_type
month_stats :
ip_type : {4: XXX, 6: XXX}
Statistics update :
None
Statistics deletion :
None
"""
class IWLAPostAnalysisIPType(IPlugin):
def load(self):
self.v4_re = re.compile('([0-9]{1,3}\.){3}[0-9]{1,3}$')
return True
def hook(self):
stats = self.iwla.getValidVisitors()
month_stats = self.iwla.getMonthStats()
if month_stats.get('ip_type', None) is None:
month_stats['ip_type'] = {4:0, 6:0}
for (k, super_hit) in stats.items():
if super_hit.get('ip_type', None) is None:
if self.v4_re.match(super_hit['remote_ip']):
_type = 4
else:
_type = 6
super_hit['ip_type'] = _type
month_stats['ip_type'][_type] += 1

View File

@ -41,7 +41,7 @@ Output files :
Statistics creation :
visits :
remote_addr =>
remote_ip =>
operating_system
month_stats :

View File

@ -136,6 +136,7 @@ class IWLAPostAnalysisReferers(IPlugin):
for r in super_hit['requests'][::-1]:
if not self.iwla.isValidForCurrentAnalysis(r): break
if not r['http_referer']: continue
if not self.iwla.hasBeenViewed(r): continue
uri = r['extract_referer']['extract_uri']
if self.own_domain_re.match(uri): continue

View File

@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
#
# Copyright Grégory Soutadé 2023
# This file is part of iwla
# iwla is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# iwla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with iwla. If not, see <http://www.gnu.org/licenses/>.
#
import re
from iwla import IWLA
from iplugin import IPlugin
"""
Post analysis hook
Group top pages by subdomains
Plugin requirements :
post_analysis/top_pages
Conf values needed :
None
Output files :
None
Statistics creation :
month_stats:
subdomains =>
domain => count
Statistics update :
None
Statistics deletion :
None
"""
class IWLAPostAnalysisSubDomains(IPlugin):
def __init__(self, iwla):
super(IWLAPostAnalysisSubDomains, self).__init__(iwla)
self.requires = ['IWLAPostAnalysisTopPages']
def load(self):
self.domain_re = re.compile(r'([^/]*)/.*')
return True
def hook(self):
month_stats = self.iwla.getMonthStats()
top_pages = month_stats['top_pages']
subdomains = {}
for (uri, count) in top_pages.items():
domain = self.domain_re.match(uri)
if not domain: continue
domain = domain.group(1)
subdomains[domain] = subdomains.get(domain, 0) + count
month_stats['subdomains'] = subdomains

View File

@ -75,7 +75,7 @@ class IWLAPostAnalysisTopPages(IPlugin):
uri = r['extract_request']['extract_uri']
if self.index_re.match(uri):
uri = '/'
uri = ''
uri = "%s%s" % (r.get('server_name', ''), uri)

View File

@ -19,32 +19,40 @@
#
import re
import time
from iwla import IWLA
from iplugin import IPlugin
"""
Post analysis hook
Pre analysis hook
Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
If merge_feeds_parsers is set to True, merge feeds parsers with the same user agent
as it must be the same person with a different IP address.
Warning : When merge_feeds_parsers is activated, last access display date is the more
recent date of all merged parsers found
Plugin requirements :
None
Conf values needed :
feeds
feeds_referers*
feeds_agents*
merge_feeds_parsers*
Output files :
None
Statistics creation :
remote_addr =>
remote_ip =>
feed_parser
feed_name_analysed
feed_name_analyzed
feed_parser_last_access (for merged parser)
feed_domain
feed_uri
feed_subscribers
Statistics update :
None
@ -66,9 +74,10 @@ class IWLAPostAnalysisFeeds(IPlugin):
def load(self):
feeds = self.iwla.getConfValue('feeds', [])
feeds_referers = self.iwla.getConfValue('feeds_referers', [])
feeds_agents = self.iwla.getConfValue('feeds_agents', [])
self.merge_feeds_parsers = self.iwla.getConfValue('merge_feeds_parsers', False)
_merge_feeds_parsers_list = self.iwla.getConfValue('merge_feeds_parsers_list', [])
_no_merge_feeds_parsers_list = self.iwla.getConfValue('no_merge_feeds_parsers_list', [])
if feeds is None: return False
@ -84,41 +93,61 @@ class IWLAPostAnalysisFeeds(IPlugin):
self.user_agents_re.append(re.compile(r'.*atom.*'))
self.user_agents_re.append(re.compile(r'.*feed.*'))
self.referers_uri = []
for f in feeds_referers:
self.referers_uri.append(f)
for f in feeds_agents:
self.user_agents_re.append(re.compile(f))
self.bad_user_agents_re = []
self.bad_user_agents_re.append(re.compile(r'.*feedback.*'))
self.subscribers_re = re.compile(r'.* ([0-9]+) subscriber.*')
self.merge_feeds_parsers_list = []
for f in _merge_feeds_parsers_list:
self.merge_feeds_parsers_list.append(re.compile(f))
self.no_merge_feeds_parsers_list = []
for f in _no_merge_feeds_parsers_list:
self.no_merge_feeds_parsers_list.append(re.compile(f))
self.merged_feeds = {}
return True
def _appendToMergeCache(self, isFeedParser, key, hit):
hit['feed_parser'] = isFeedParser
# First time, register into dict
if self.merged_feeds.get(key, None) is None:
# Merged
self.merged_feeds[key] = hit
else:
elif hit['remote_ip'] != self.merged_feeds[key]['remote_ip']:
# Next time
# Current must be ignored
hit['feed_parser'] = self.NOT_A_FEED_PARSER
merged_hit = hit
last_access = hit['last_access']
# Previous matched hit must be set as merged
isFeedParser = self.MERGED_FEED_PARSER
hit = self.merged_feeds[key]
hit['feed_parser'] = isFeedParser
hit['feed_parser'] = self.MERGED_FEED_PARSER
hit['viewed_pages'][0] += merged_hit['viewed_pages'][0]
hit['viewed_hits'][0] += merged_hit['viewed_hits'][0]
hit['not_viewed_pages'][0] += merged_hit['not_viewed_pages'][0]
hit['not_viewed_hits'][0] += merged_hit['not_viewed_hits'][0]
if hit['last_access'] < merged_hit['last_access']:
hit['feed_parser_last_access'] = merged_hit['last_access']
else:
hit['feed_parser_last_access'] = hit['last_access']
def mergeFeedsParsers(self, isFeedParser, hit):
if isFeedParser:
# One hit only match
if True or (hit['viewed_hits'][0] + hit['not_viewed_hits'][0]) == 1:
for r in self.merge_feeds_parsers_list:
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']):
#print('hit match %s' % (hit['remote_addr']))
self._appendToMergeCache(isFeedParser, r, hit)
return
if isFeedParser in (self.FEED_PARSER, self.MERGED_FEED_PARSER):
for r in self.no_merge_feeds_parsers_list:
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']) or r.match(hit['requests'][0]['http_user_agent']):
return
for r in self.merge_feeds_parsers_list:
if r.match(hit['remote_addr']) or r.match(hit['remote_ip']) or r.match(hit['requests'][0]['http_user_agent']):
# One group can view multiple different feeds
key = r.pattern + hit.get('feed_domain', '') + hit.get('feed_uri', '')
self._appendToMergeCache(isFeedParser, key, hit)
return
#print("No match for %s : %d" % (hit['remote_addr'], hit['viewed_hits'][0] + hit['not_viewed_hits'][0]))
# Other cases, look for user agent
user_agent = hit['requests'][0]['http_user_agent'].lower()
@ -129,52 +158,68 @@ class IWLAPostAnalysisFeeds(IPlugin):
for hit in hits.values():
isFeedParser = hit.get('feed_parser', None)
# Register already tagged feed parser in merged_feeds
if self.merge_feeds_parsers and\
not isFeedParser in (None, self.BAD_FEED_PARSER):
self.mergeFeedsParsers(isFeedParser, hit)
if isFeedParser == self.NOT_A_FEED_PARSER:
continue
# Second time
if isFeedParser:
if hit['feed_parser'] == self.BAD_FEED_PARSER: continue
if not hit.get('feed_name_analysed', False) and\
hit.get('dns_name_replaced', False):
hit['feed_name_analysed'] = True
addr = hit.get('remote_addr', None)
for r in self.bad_feeds_re:
if r.match(addr):
hit['feed_parser'] = self.BAD_FEED_PARSER
break
# Update last access time
if hit['last_access'] > hit.get('feed_parser_last_access', time.gmtime(0)):
hit['feed_parser_last_access'] = hit['last_access']
# Register already tagged feed parser in merged_feeds
if self.merge_feeds_parsers:
self.mergeFeedsParsers(isFeedParser, hit)
continue
request = hit['requests'][0]
isFeedParser = self.NOT_A_FEED_PARSER
uri = request['extract_request']['extract_uri'].lower()
for regexp in self.feeds_re:
if regexp.match(uri):
if regexp.match(uri) and self.iwla.hasBeenViewed(request):
isFeedParser = self.FEED_PARSER
# Robot that views pages -> bot
if hit['robot']:
if hit['not_viewed_pages'][0]:
isFeedParser = self.NOT_A_FEED_PARSER
# # Robot that views pages -> bot
# if hit['robot']:
# if hit['not_viewed_pages'][0]:
# isFeedParser = self.NOT_A_FEED_PARSER
break
user_agent = request['http_user_agent'].lower()
if isFeedParser == self.NOT_A_FEED_PARSER:
user_agent = request['http_user_agent'].lower()
for regexp in self.user_agents_re:
if regexp.match(user_agent):
isFeedParser = self.FEED_PARSER
break
if isFeedParser == self.NOT_A_FEED_PARSER and\
request.get('extract_referer', False):
referer = request['extract_referer']['extract_uri'].lower()
for uri in self.referers_uri:
if referer == uri:
isFeedParser = self.FEED_PARSER
if isFeedParser == self.FEED_PARSER:
for regexp in self.bad_user_agents_re:
if regexp.match(user_agent):
isFeedParser = self.NOT_A_FEED_PARSER
break
if isFeedParser == self.FEED_PARSER:
if not hit.get('dns_name_replaced', False):
self.iwla.reverseDNS(hit)
if not hit.get('feed_name_analyzed', False):
hit['feed_name_analyzed'] = True
addr = hit.get('remote_addr', None)
for r in self.bad_feeds_re:
if r.match(addr):
isFeedParser = self.NOT_A_FEED_PARSER
break
if isFeedParser == self.FEED_PARSER:
hit['feed_domain'] = request['server_name']
hit['feed_uri'] = uri
hit['feed_subscribers'] = 0
subscribers = self.subscribers_re.match(user_agent)
if subscribers:
hit['feed_subscribers'] = int(subscribers.groups()[0])
hit['robot'] = True
hit['feed_parser'] = isFeedParser
if self.merge_feeds_parsers:
self.mergeFeedsParsers(isFeedParser, hit)
else:
hit['feed_parser'] = isFeedParser

View File

@ -19,12 +19,13 @@
#
import socket
import re
from iwla import IWLA
from iplugin import IPlugin
"""
Post analysis hook
Pre analysis hook
Replace IP by reverse DNS names
@ -32,7 +33,7 @@ Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
robot_domains*
Output files :
None
@ -51,31 +52,28 @@ Statistics deletion :
"""
class IWLAPostAnalysisReverseDNS(IPlugin):
DEFAULT_DNS_TIMEOUT = 0.5
def __init__(self, iwla):
super(IWLAPostAnalysisReverseDNS, self).__init__(iwla)
self.API_VERSION = 1
def load(self):
timeout = self.iwla.getConfValue('reverse_dns_timeout',
IWLAPostAnalysisReverseDNS.DEFAULT_DNS_TIMEOUT)
socket.setdefaulttimeout(timeout)
self.robot_domains_re = []
robot_domains = self.iwla.getConfValue('robot_domains', [])
for domain in robot_domains:
self.robot_domains_re.append(re.compile(domain))
return True
def hook(self):
hits = self.iwla.getCurrentVisits()
for (k, hit) in hits.items():
if hit.get('dns_analysed', False): continue
if not hit.get('feed_parser', False) and\
not self.iwla.isValidVisitor(hit):
# Do reverse for feed parser even if they're not
# valid visitors
if hit.get('robot', False) and not hit.get('feed_parser', False):
continue
try:
name, _, _ = socket.gethostbyaddr(k)
hit['remote_addr'] = name.lower()
hit['dns_name_replaced'] = True
except:
pass
finally:
hit['dns_analysed'] = True
res = self.iwla.reverseDNS(hit)
for r in self.robot_domains_re:
if r.match(hit['remote_addr']):
hit['robot'] = True
break

View File

@ -36,7 +36,8 @@ Plugin requirements :
None
Conf values needed :
None
count_hit_only_visitors
no_referrer_domains
Output files :
None
@ -55,15 +56,19 @@ Statistics deletion :
"""
class IWLAPreAnalysisRobots(IPlugin):
def __init__(self, iwla):
super(IWLAPreAnalysisRobots, self).__init__(iwla)
self.API_VERSION = 1
def load(self):
self.awstats_robots = list(map(lambda x : re.compile(('.*%s.*') % (x), re.IGNORECASE), awstats_data.robots))
self.robot_re = re.compile(r'.*bot.*', re.IGNORECASE)
self.crawl_re = re.compile(r'.*crawl.*', re.IGNORECASE)
self.compatible_re = []
self.compatible_re.append(re.compile(r'.*\(.*compatible; ([^;]+);.*\).*'))
self.compatible_re.append(re.compile(r'.*\(.*compatible; (.*)\).*'))
self.compatible_re.append(re.compile(r'.*\(([^;]+); \+.*\).*'))
self.compatible_re.append(re.compile(r'(.*); \(\+.*\)*'))
self.logger = logging.getLogger(self.__class__.__name__)
self.one_hit_only = self.iwla.getConfValue('count_hit_only_visitors', False)
self.no_referrer_domains = self.iwla.getConfValue('no_referrer_domains', [])
return True
@ -73,26 +78,36 @@ class IWLAPreAnalysisRobots(IPlugin):
info = inspect.getframeinfo(frame)
self.logger.debug('%s is a robot (caller %s:%d)' % (k, info.function, info.lineno))
super_hit['robot'] = 1
super_hit['robot'] = True
super_hit['keep_requests'] = False
agent = super_hit['requests'][0]['http_user_agent']
for compatible_re in self.compatible_re:
robot_name = compatible_re.match(agent)
if robot_name:
super_hit['robot_name'] = robot_name[1]
break
# Basic rule to detect robots
def hook(self):
hits = self.iwla.getCurrentVisits()
for (k, super_hit) in hits.items():
if super_hit['robot']:
self.logger.debug('%s is a robot' % (k))
# Already analyzed
if super_hit.get('robot', None) in (True, False):
if super_hit['robot'] == True:
self.logger.debug('%s is a robot' % (k))
continue
if super_hit.get('feed_parser', False):
self.logger.debug('%s is feed parser' % (k))
continue
super_hit['robot'] = False
isRobot = False
referers = 0
first_page = super_hit['requests'][0]
if self.robot_re.match(first_page['http_user_agent']) or\
self.crawl_re.match(first_page['http_user_agent']):
self.logger.debug(first_page['http_user_agent'])
@ -110,12 +125,18 @@ class IWLAPreAnalysisRobots(IPlugin):
continue
# 1) no pages view --> robot
# if not super_hit['viewed_pages'][0]:
# super_hit['robot'] = 1
# continue
if not self.one_hit_only and not super_hit['viewed_pages'][0]:
self._setRobot(k, super_hit)
continue
# 2) Less than 1 hit per page
if super_hit['viewed_pages'][0] and (super_hit['viewed_hits'][0] < super_hit['viewed_pages'][0]):
isRobot = True
# 2.5) 1 page, 1 hit
elif super_hit['viewed_pages'][0] == 1 and super_hit['viewed_hits'][0] == 1:
isRobot = True
if isRobot:
self._setRobot(k, super_hit)
continue
@ -124,30 +145,42 @@ class IWLAPreAnalysisRobots(IPlugin):
self._setRobot(k, super_hit)
continue
not_found_pages = 0
error_codes = 0
not_modified_pages = 0
for hit in super_hit['requests']:
# 5) /robots.txt read
if hit['extract_request']['http_uri'].endswith('/robots.txt'):
self._setRobot(k, super_hit)
break
if int(hit['status']) == 404 or int(hit['status']) == 403:
not_found_pages += 1
# Exception for favicon.png and all apple-*icon*
if int(hit['status']) >= 400 and int(hit['status']) <= 499 and\
'icon' not in hit['extract_request']['http_uri']:
error_codes += 1
elif int(hit['status']) in (304,):
not_modified_pages += 1
# 6) Any referer for hits
if not hit['is_page'] and hit['http_referer']:
if not hit['is_page'] and hit['http_referer'] not in ('', '-'):
referers += 1
if isRobot:
self._setRobot(k, super_hit)
continue
# 7) more than 10 404/403 pages
if not_found_pages > 10:
# 6) Any referer for hits
if super_hit['viewed_hits'][0] and not referers and\
not super_hit['requests'][0]['server_name'] in self.no_referrer_domains:
self._setRobot(k, super_hit)
continue
if not super_hit['viewed_pages'][0] and \
(super_hit['viewed_hits'][0] and not referers):
# 7) more than 10 4XX or 304 pages
if error_codes > 10 or not_modified_pages > 50:
self._setRobot(k, super_hit)
continue
# 8) Special case : 1 page and 1 hit, but not from the same source
if (super_hit['viewed_pages'][0] == 1 and super_hit['viewed_hits'][0] == 1 and len(super_hit['requests']) == 2) and\
(super_hit['requests'][0]['server_name'] != super_hit['requests'][1]['server_name']):
self._setRobot(k, super_hit)
continue

View File

@ -68,6 +68,7 @@ td:first-child
.iwla_search { background : #F4F090; }
.iwla_weekend { background : #ECECEC; }
.iwla_curday { font-weight: bold; }
.iwla_curday > a { font-weight: bold; color:black}
.iwla_others { color: #668; }
.iwla_update { background : orange; }
.iwla_new { background : green }

Binary file not shown.

Before

Width:  |  Height:  |  Size: 529 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 216 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 518 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 304 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 235 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 211 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 617 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 275 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 586 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 465 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 595 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 427 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 233 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 551 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 171 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 608 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 625 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 246 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 519 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 588 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 534 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 461 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 632 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 488 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 450 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 404 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 169 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 568 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 592 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 620 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 211 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 169 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 590 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 187 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 164 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 677 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 209 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 187 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 405 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 403 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 232 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 637 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 224 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 178 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 642 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 476 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 234 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 234 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 607 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 653 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 376 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 231 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 599 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 245 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 623 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 285 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 146 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 346 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 558 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 600 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 494 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 233 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 333 B

Some files were not shown because too many files have changed in this diff Show More