Difference between revisions of "LCS in context of ML- Martin Mihál"

From RoboWiki
Jump to: navigation, search
(Preparing of data)
(Preparing of data)
Line 50: Line 50:
 
#fromCoimbra if user is from Coimbra city
 
#fromCoimbra if user is from Coimbra city
 
#feb_visits - TARGET VALUE - HOW MANY PV HAS GIVEN USER IN FEBUARY
 
#feb_visits - TARGET VALUE - HOW MANY PV HAS GIVEN USER IN FEBUARY
 
 
', '', 'stayMoreThan14', 'stayMoreThan21',
 
      'stayMoreThan28', 'fromMobile', 'fromDesktop', 'HadDirectVisits',
 
      'HadSocialVisits', 'HadExternalVisits', 'HadSearchVisits', 'isIOSuser',
 
      'visitSection13', 'visitSection12', 'visitSection11', 'visitSection10',
 
      'visitSection9', 'visitSection8', 'visitSection7', 'visitSection6',
 
      'visitSection5', 'visitSection4', 'visitSection3', 'visitSection2',
 
      'visitSection1', 'fromUK', 'fromBrazil', 'fromUSA', 'fromPortugal',
 
      'fromOther', 'isAdblockUser', 'fromLisbao', 'fromPorto', 'fromBraga',
 
      'fromCoimbra', 'feb_visits', 'rand'
 
 
I'll use just random sample of users(~10%) because of extremely big number of user on mentioned website.
 
  
 
==Data selection==
 
==Data selection==

Revision as of 17:59, 29 May 2017

Goal of project

Goal of the project is show possibilities of LCS algorithm in non-robotic enviroment - in area of - we can say - "pure" machine learning. I'll use data of activity of user of given website(http://www.publico.es/) from Piano(http://piano.io/) system and the goal of whole process will be define set of rules which best define users which will come back to website - that means they are "loyal" in some way and we can try to target them with locking of some content and deliver them offer. On the other site, users which are not loyal yet(they won't probablycoma back) - we want to give them freedom in browsing of given website and and develop interest or "addiction" and give them offer later.

Overview

I'll use data of activity of user from period(in this experiment month) A and we want to predict if user will have any interaction in following period B. Between period A and B are some days because we want to track "long-term" loyality no "from-day-to-day" loyality.

Preparing of data

In the process of preparing data are important to take in consideration 2 factors:

  1. Understanding of data we have and problem we want solve
  2. Type of data which used algorithm need


As a target value I will use number of pageviews(PV) in period B.

For describing users I'm gonna use following data from period A(all data are in binar value (1/0) because that's format of data which LCS need, except number of PV in period A which I will use in different way). All selected variables has background in data - for example we want track if people from different cities/countries has different properties, we check also how many days there is between first and last PV in period A, if they visit particular sections etc. :

  1. jan_visits - only non-binary feature - number of PV in period A
  2. stayMoreThan7 - if the difference between first and last PV was more than 7 days in period A
  3. stayMoreThan14 - if the difference between first and last PV was more than 14 days in period A
  4. stayMoreThan21 - if the difference between first and last PV was more than 21 days in period A
  5. stayMoreThan28 - if the difference between first and last PV was more than 28 days in period A
  6. fromMobile - if user is from mobile or tablet device
  7. fromDesktop - if user is from desktop device
  8. HadDirectVisits - if user visit website direct at least once (write url in browser)
  9. HadSocialVisits - if user come from social website like FB, Twitter etc at least one
  10. HadExternalVisits - if user come from different source at least one
  11. HadSearchVisits - if user come from search engine at least one
  12. isIOSuser - if user is iOS user
  13. visitSection13 - of user visit at least 13th most popular section of website
  14. visitSection12 - of user visit at least 12th most popular section of website
  15. visitSection11- of user visit at least 11th most popular section of website
  16. visitSection10 - of user visit at least 10th most popular section of website
  17. visitSection9 - of user visit at least 9th most popular section of website
  18. visitSection8 - of user visit at least 8th most popular section of website
  19. visitSection7 - of user visit at least 7th most popular section of website
  20. visitSection6 - of user visit at least 6th most popular section of website
  21. visitSection5 - of user visit at least 5th most popular section of website
  22. visitSection4 - of user visit at least 4th most popular section of website
  23. visitSection3 - of user visit at least 3th most popular section of website
  24. visitSection2 - of user visit at least 2th most popular section of website
  25. visitSection1- of user visit at least 1th most popular section of website
  26. fromUK - if user is from UK
  27. fromBrazil - if user is from Brazil
  28. fromUSA - if user is from USA
  29. fromPortugal - if user is from Portugal
  30. fromOther - if user is not from Portugal or US or Brazil or UK (most user are from those countries)
  31. isAdblockUser - if user is using adbocker
  32. fromLisbao - if user is from Lisbao city
  33. fromBraga - if user is from Braga city
  34. fromCoimbra if user is from Coimbra city
  35. feb_visits - TARGET VALUE - HOW MANY PV HAS GIVEN USER IN FEBUARY

Data selection

TODO

LCS algorithm

TODO

Results

TODO