{"id":56,"date":"2016-02-28T16:18:42","date_gmt":"2016-02-29T01:18:42","guid":{"rendered":"https:\/\/posuer000.wordpress.com\/?p=56"},"modified":"2016-02-28T16:18:42","modified_gmt":"2016-02-29T01:18:42","slug":"tips-for-basic-data-analysis-in-python","status":"publish","type":"post","link":"https:\/\/wanggengyu.com\/?p=56","title":{"rendered":"Tips for basic data analysis in Python"},"content":{"rendered":"<p>source from python.org<\/p>\n<ul>\n<li>\n<h3>file header<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\nfrom sklearn.decomposition import PCA<br \/>\nfrom sklearn.lda import LDA<br \/>\nfrom numpy import genfromtxt<br \/>\nfrom sklearn import preprocessing<br \/>\nimport numpy as np<br \/>\n[\/code]<\/p>\n<ul>\n<li>\n<h3>load data from csv file<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\nraw_sensor = genfromtxt(&#8216;sensor.csv&#8217;, dtype=&#8217;string&#8217;, delimiter=&#8217;,&#8217;)<br \/>\nstring_sensor = raw_sensor[1:,1:9]<br \/>\nsensor = string_sensor.astype(np.float)<br \/>\nX = sensor[:100, 6:8]#0:2 acceleration data, 3:5 gyroscope data, 6:8 Magnetic data<br \/>\n[\/code]<br \/>\n<!--more Read More--><\/p>\n<ul>\n<li>\n<h3>convert label to integer<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\n#convert label to integer<br \/>\nraw_label = raw_sensor[1:,16]<br \/>\nle = preprocessing.LabelEncoder()<br \/>\nle.fit([&amp;amp;amp;quot;Sit&amp;amp;amp;quot;, &amp;amp;amp;quot;Lie&amp;amp;amp;quot;, &amp;amp;amp;quot;Stand&amp;amp;amp;quot;, &amp;amp;amp;quot;Walk&amp;amp;amp;quot;])<br \/>\ntarget = le.transform(raw_label)<br \/>\ntarget_names = np.array([&amp;amp;amp;quot;Sit&amp;amp;amp;quot;, &amp;amp;amp;quot;Lie&amp;amp;amp;quot;, &amp;amp;amp;quot;Stand&amp;amp;amp;quot;, &amp;amp;amp;quot;Walk&amp;amp;amp;quot;])<br \/>\ny = target<br \/>\n[\/code]<\/p>\n<ul>\n<li>\n<h3>plot two or one dimension data as scatter diagram<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\nplt.figure()<br \/>\nfor c, i, target_name in zip(&amp;amp;amp;quot;rgby&amp;amp;amp;quot;, [0, 1, 2, 3], target_names):<br \/>\n    #plt.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, label=target_name)#for 2 dimensions data<br \/>\n    plt.plot(X_r, np.zeros_like(X_r) + 0, &#8216;x&#8217;, c=c, label = target_name)#for 1 dimension data<br \/>\nplt.legend()<br \/>\nplt.title(&#8216;PCA of Magnetic dataset&#8217;)<br \/>\n[\/code]<\/p>\n<ul>\n<li>\n<h3>plot two dimension data as scatter diagram (style 2)<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\nplt.figure(2, figsize=(8, 6))<br \/>\nplt.clf()<\/p>\n<p># Plot the training points<br \/>\nplt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)<br \/>\nplt.xlabel(&#8216;Sepal length&#8217;)<br \/>\nplt.ylabel(&#8216;Sepal width&#8217;)<\/p>\n<p>plt.xlim(x_min, x_max)<br \/>\nplt.ylim(y_min, y_max)<br \/>\nplt.xticks(())<br \/>\nplt.yticks(())<br \/>\n[\/code]<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-106\" src=\"https:\/\/posuer000.files.wordpress.com\/2016\/02\/e5beaee4bfa1e688aae59bbe_20160229100924.png\" alt=\"\u5fae\u4fe1\u622a\u56fe_20160229100924\" width=\"535\" height=\"421\" \/><\/p>\n<ul>\n<li>\n<h3>plot 3D scatter diagram<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\n# To getter a better understanding of interaction of the dimensions<br \/>\n# plot the first three PCA dimensions<br \/>\nfig = plt.figure(1, figsize=(8, 6))<br \/>\nax = Axes3D(fig, elev=-150, azim=110)<br \/>\nX_reduced = PCA(n_components=3).fit_transform(iris.data)<br \/>\nax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y,<br \/>\n           cmap=plt.cm.Paired)<br \/>\nax.set_title(&amp;amp;amp;quot;First three PCA directions&amp;amp;amp;quot;)<br \/>\nax.set_xlabel(&amp;amp;amp;quot;1st eigenvector&amp;amp;amp;quot;)<br \/>\nax.w_xaxis.set_ticklabels([])<br \/>\nax.set_ylabel(&amp;amp;amp;quot;2nd eigenvector&amp;amp;amp;quot;)<br \/>\nax.w_yaxis.set_ticklabels([])<br \/>\nax.set_zlabel(&amp;amp;amp;quot;3rd eigenvector&amp;amp;amp;quot;)<br \/>\nax.w_zaxis.set_ticklabels([])<br \/>\n[\/code]<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-117\" src=\"https:\/\/posuer000.files.wordpress.com\/2016\/02\/e5beaee4bfa1e688aae59bbe_20160229101736.png\" alt=\"\u5fae\u4fe1\u622a\u56fe_20160229101736\" width=\"577\" height=\"460\" \/><\/p>\n<ul>\n<li>\n<h3>Stack arrays in sequence vertically (row wise).<\/h3>\n<\/li>\n<\/ul>\n<p>[code language=&#8221;python&#8221;]<br \/>\nS = np.vstack((sensor[:,1],sensor[:,2])).T #.T matrix transpose<br \/>\n[\/code]<\/p>\n<p>\nStack arrays in sequence vertically (row wise).<br \/>\nTake a sequence of arrays and stack them vertically to make a single array. Rebuild arrays divided by vsplit.\n<\/p>\n<h5>example<\/h5>\n<p>[code language=&#8221;python&#8221;]<br \/>\n&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; a = np.array([1, 2, 3])<br \/>\n&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; b = np.array([2, 3, 4])<br \/>\n&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; np.vstack((a,b))<br \/>\narray([[1, 2, 3],<br \/>\n       [2, 3, 4]])<br \/>\n[\/code]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>source from python.org file header [code language=&#8221;python&#8221;] from sklearn.decomposition import PCA from sklearn.lda import LDA from numpy import genfromtxt from sklearn import preprocessing import numpy as np [\/code] load data from csv file [code language=&#8221;python&#8221;] raw_sensor = genfromtxt(&#8216;sensor.csv&#8217;, dtype=&#8217;string&#8217;, delimiter=&#8217;,&#8217;) string_sensor = raw_sensor[1:,1:9] sensor = string_sensor.astype(np.float) X = sensor[:100, 6:8]#0:2&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-56","post","type-post","status-publish","format-standard","hentry","category-techniques"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/wanggengyu.com\/index.php?rest_route=\/wp\/v2\/posts\/56","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wanggengyu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wanggengyu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wanggengyu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wanggengyu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=56"}],"version-history":[{"count":0,"href":"https:\/\/wanggengyu.com\/index.php?rest_route=\/wp\/v2\/posts\/56\/revisions"}],"wp:attachment":[{"href":"https:\/\/wanggengyu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=56"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wanggengyu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=56"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wanggengyu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=56"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}