Commit Graph

51 Commits

Author SHA1 Message Date
Gregory Soutade
bde91ca936 Move reverse DNS core management into iwla.py + Add robot_domains configuration 2024-10-27 09:16:01 +01:00
Gregory Soutade
70de0d3aca Add no_merge_feeds_parsers_list conf value 2024-10-27 09:15:39 +01:00
Gregory Soutade
9939922c31 Move feeds and reverse_dns plugins from post_analysis to pre_analysis 2024-10-02 08:27:53 +02:00
Gregory Soutade
6d46ac4461 Robots: Improve compatible keyword detection for robots 2024-07-28 09:25:40 +02:00
Gregory Soutade
974d355dd4 Add no_referrer_domains list to defaut_conf for website that defines this policy 2024-01-30 11:24:52 +01:00
Gregory Soutade
16cd817fec Increase not modified page threshold for robot detection 2023-07-05 09:15:48 +02:00
Gregory Soutade
71d8ee2113 Forgot Firefox icon 2023-03-25 08:11:57 +01:00
Gregory Soutade
440f51ddd1 Remove robot rule 1 page for phones 2023-03-23 21:17:52 +01:00
Gregory Soutade
a0a1f42df4 Update robot detection plugin :
* Do analyze only one time by month
  * Reactivate rule : no page view if count_hit_only_visitors is False
  * Add exception for "Less than 1 hit per page" rule if a phone is used
  * Check for all error codes in 400..499, not only 403 and 404
  * Referer '-' now counted as null
2023-03-11 20:48:17 +01:00
Gregory Soutade
c8dfdd17f7 Add "compatible" as a criteria for robot 2023-02-18 08:49:14 +01:00
Gregory Soutade
a5bef4ece6 Search for "compatible" in all requests, not only the first one 2023-02-18 08:48:57 +01:00
Gregory Soutade
21a21cd68f Add a new rule for robots : 1 page and 1 hit, but not from the same source 2023-02-04 08:40:04 +01:00
Gregory Soutade
6a4fd4e9c8 New rule for robot : more than 10 not modified pages in a row 2023-01-28 09:40:26 +01:00
Gregory Soutade
ac246eabe2 Find robot name in 'compatible' string and group them 2023-01-28 09:38:59 +01:00
Gregory Soutade
975cc66bd5 Don't launch robot analysis rules for feed parsers 2022-11-16 21:10:11 +01:00
Gregory Soutade
4d3c2107f0 Don't save all visitors requests into database (save space and computing). Can be changed in deufalt_conf.py with keep_requests value 2022-06-23 21:16:30 +02:00
5130b1f6d8 Bad 2to3 python conversion : map() function needs to be included into list() operator. If not, they're only analyzed once 2021-08-06 08:45:04 +02:00
Gregory Soutade
0c2ac431d1 Be more strict with robots : requires at least 1 hit per viewed page 2021-06-03 08:52:04 +02:00
f457f4e390 Update code for Python3 2020-10-30 14:42:56 +01:00
Gregory Soutade
bb268114b2 Make backup before compressing (low memory servers)
Fix error : Call post hook plugins even in display only mode
Don't compute unordered hits (remove pasts if they are found after current)
Remove tags in stats diff
Don't do geolocalisation is visitor is not valid
Don't try to find search engine on robots
Update robot check rules
Add top_pages_diff plugin
2019-08-30 07:50:54 +02:00
Gregory Soutade
007be71ad6 New format for (not_)viewed pages/hits and bandwidth that are now recorded by day (in a dictionnary were only element 0 is initialized). Element 0 is the total. WARNING : not backward compatible with previous databases. 2017-08-24 07:55:53 +02:00
Gregory Soutade
68a67adecc Add one more rule to robot detection : more than ten 404 pages viewed 2017-05-25 21:04:18 +02:00
Gregory Soutade
12cc80208d Do merge 2016-02-06 14:45:09 +01:00
Gregory Soutade
4cb3b21ca5 Add reset feature
Allow to open .gz file transparently
Import debug in robots.py
2015-05-22 07:51:11 +02:00
Gregory Soutade
62be78845a Add debug traces in robots plugin 2015-05-13 18:13:18 +02:00
Gregory Soutade
df78a3f4cb [pre_analysis/robots] Don't checks for /robots.txt request, but endswith /robots.txt for robot detection 2015-04-06 17:52:31 +02:00
Gregory Soutade
1d9bf71b4b Set arguments of page_to_hit facultative 2015-01-13 18:54:57 +01:00
Gregory Soutade
4c74a14037 Filter robot with *bot* and *crawl* re 2015-01-11 18:06:44 +01:00
Grégory Soutadé
a35d462cb7 Replace # for module description by """ (help auto extraction) 2014-12-19 11:34:25 +01:00
e740bf1e45 Add licence information 2014-12-18 19:54:31 +01:00
Gregory Soutade
3a246d5cd6 Optimize analysis using reverse loop 2014-12-14 15:10:13 +01:00
4f1c09867d WIP 2014-12-10 07:09:05 +01:00
Grégory Soutadé
751a9b3fae Start big comments (post analysis / referers) 2014-12-09 16:54:02 +01:00
Grégory Soutadé
c87ddfb1aa Add hit_to_page_conf in addition to page_to_hit_conf 2014-11-27 13:46:58 +01:00
Grégory Soutadé
5ccc63c7ae Add hasBeenViewed() function 2014-11-27 13:07:14 +01:00
Grégory Soutadé
9fbc5448bc Add conf_requires.
Load plugins in order
2014-11-27 12:34:42 +01:00
Grégory Soutadé
dd8349ab08 Add option count_hit_only_visitors and function isValidForCurrentAnalysis() 2014-11-27 09:01:51 +01:00
6b0ed18f35 Remove viewed limitation in page_to_hit : skip good requests 2014-11-26 22:06:58 +01:00
fec5e375e4 Remove iwla parameter in hook functions 2014-11-26 20:31:13 +01:00
9571bf09b6 Work with time 2014-11-26 19:53:00 +01:00
Grégory Soutadé
e6b31fbf8a WIP 2014-11-26 16:56:33 +01:00
Grégory Soutadé
81b3eee552 Do a lot of things 2014-11-26 16:17:16 +01:00
Grégory Soutadé
7405cf237a Do a more generic plugin : page_to_hit 2014-11-25 16:22:07 +01:00
d5db763b48 Rework conf in plugins 2014-11-24 21:42:57 +01:00
549c0e5d97 Update conf management 2014-11-24 21:37:37 +01:00
Gregory Soutade
21a95cc2fa Rework plugins with classes 2014-11-24 17:13:59 +01:00
Gregory Soutade
670f024905 Add bytesToStr()
Automatically convert list into strings in appendRow()
Add package information
2014-11-24 13:44:04 +01:00
Gregory Soutade
e51e07f65e Very nice result 2014-11-21 16:56:58 +01:00
Gregory Soutade
7dada493ab Plugins OK 2014-11-21 10:41:29 +01:00
Gregory Soutade
f3cb04b16c Externalize plugins 2014-11-20 16:15:57 +01:00