Tips for basic data analysis in Python

source from

  • file header

[code language=”python”]
from sklearn.decomposition import PCA
from sklearn.lda import LDA
from numpy import genfromtxt
from sklearn import preprocessing
import numpy as np

  • load data from csv file

[code language=”python”]
raw_sensor = genfromtxt(‘sensor.csv’, dtype=’string’, delimiter=’,’)
string_sensor = raw_sensor[1:,1:9]
sensor = string_sensor.astype(np.float)
X = sensor[:100, 6:8]#0:2 acceleration data, 3:5 gyroscope data, 6:8 Magnetic data

  • convert label to integer

[code language=”python”]
#convert label to integer
raw_label = raw_sensor[1:,16]
le = preprocessing.LabelEncoder()["Sit", "Lie", "Stand", "Walk"])
target = le.transform(raw_label)
target_names = np.array(["Sit", "Lie", "Stand", "Walk"])
y = target

  • plot two or one dimension data as scatter diagram

[code language=”python”]
for c, i, target_name in zip("rgby", [0, 1, 2, 3], target_names):
#plt.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, label=target_name)#for 2 dimensions data
plt.plot(X_r, np.zeros_like(X_r) + 0, ‘x’, c=c, label = target_name)#for 1 dimension data
plt.title(‘PCA of Magnetic dataset’)

  • plot two dimension data as scatter diagram (style 2)

[code language=”python”]
plt.figure(2, figsize=(8, 6))

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=Y,
plt.xlabel(‘Sepal length’)
plt.ylabel(‘Sepal width’)

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)


  • plot 3D scatter diagram

[code language=”python”]
# To getter a better understanding of interaction of the dimensions
# plot the first three PCA dimensions
fig = plt.figure(1, figsize=(8, 6))
ax = Axes3D(fig, elev=-150, azim=110)
X_reduced = PCA(n_components=3).fit_transform(
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y,
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.set_ylabel("2nd eigenvector")
ax.set_zlabel("3rd eigenvector")

  • Stack arrays in sequence vertically (row wise).

[code language=”python”]
S = np.vstack((sensor[:,1],sensor[:,2])).T #.T matrix transpose

Stack arrays in sequence vertically (row wise).
Take a sequence of arrays and stack them vertically to make a single array. Rebuild arrays divided by vsplit.


[code language=”python”]
>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 3, 4])
>>> np.vstack((a,b))
array([[1, 2, 3],
[2, 3, 4]])

Previous Post

Slove the problem that can't accesses in some regions

Recently I met some problems that can't accesse Chin site ... Read more

Next Post

Notes for reading

Topic 6 Successful students think about how to study. They ... Read more

Leave a Reply