Discussion:
Chebyshev distance for K-means
(too old to reply)
Sandro Saitta
2007-05-01 08:43:50 UTC
Permalink
Hello,

I want to try k-means with the chebyshev distance. I know it is
possible to use euclidean and city block distances but what about
chebyshev? Is there a way to use this distance with the standard
k-means matlab function?

Thanks.
Sandro.
Peter Perkins
2007-05-01 10:22:11 UTC
Permalink
Post by Sandro Saitta
Hello,
I want to try k-means with the chebyshev distance. I know it is
possible to use euclidean and city block distances but what about
chebyshev? Is there a way to use this distance with the standard
k-means matlab function?
K-means (the algorithm) requires two things:

1) a distance measure, and
2) a way to compute the centroid of a cluster so as to minimize the sum of
point-to-centroid distances.

There are lots of distance measures, but few have a simple computation for the
centroid. For example, the default distance used by KMEANS (the function) is
most definitely NOT "Euclidean", but rather "squared Euclidean", because for the
latter, the centroid is the arithmetic mean, while for the former, it is a
difficult computation requiring an iterative solution (the calculation has a
hyphenated name attached to it that escapes me).

That's why no Chebyshev distance in KMEANS.

That being said, there is a "coordinate-free" version of K-means (the algorithm)
that does not require a centroid calculation. However, it is not the standard
thing that most people want, and requires a distance matrix rather than raw
data, limiting its use for large datasets (which is what most people use K-means
for). Is that of interest to you?

It's certainly also possible to modify KMEANS to use Chebyshev distance and some
unsuitable centroid calculation, but you're on your own there.

- Peter Perkins
The MathWorks, Inc.
Sandro Saitta
2007-05-01 12:22:02 UTC
Permalink
Thanks for your answer. I'm not sure to understand all your
explanations. The Manhattan, Euclidean (more precisely SqEuclidean)
and Chebyshev are three special cases of the Minkowski distance.

If p is the parameter of the Minkowski distance, then Euclidean
distance corresponds to p=2, Manhattan to p=1 and Chebyshev to
P=infinity.

I base myself on a paper which compares clustering with Euclidean,
Manhattan and Chenyshev distances for clustering using SOM (that why
I thought of comparing Chebyyshev with sqEuclidean). In a book by
Webb, Chebyshev is defined as:

d(x,y) = sum(abs(x_i - y_i))

Starting from this, I don't understand why it could not simply
replace the standard sqEuclidean distance.

Thanks in advance for your help.
Regards.
Peter Perkins
2007-05-01 15:20:57 UTC
Permalink
Post by Sandro Saitta
Thanks for your answer. I'm not sure to understand all your
explanations. The Manhattan, Euclidean (more precisely SqEuclidean)
and Chebyshev are three special cases of the Minkowski distance.
If p is the parameter of the Minkowski distance, then Euclidean
distance corresponds to p=2, Manhattan to p=1 and Chebyshev to
P=infinity.
I base myself on a paper which compares clustering with Euclidean,
Manhattan and Chenyshev distances for clustering using SOM (that why
I thought of comparing Chebyyshev with sqEuclidean). In a book by
d(x,y) = sum(abs(x_i - y_i))
Starting from this, I don't understand why it could not simply
replace the standard sqEuclidean distance.
What's the formula to compute the centroid? For example, the centroid for city
block distance is NOT the componentwise arithmetic mean, it's the componentwise
median. If you can compute an appropriate centroid easily for the Chebyshev
distance, then you're on your way. I haven't thought about this at all, and
there may an obvious answer for the centroid.

K-means partitions your data. The criterion for the partition is that the sum
of the point-to-centroid distances is minimized over all partitions. It makes
little or no sense to find a partition that minimizes that criterion if the
centroid itself is not the point that minimizes the point-to-centroid distances
within a cluster.
Sandro Saitta
2007-05-02 08:06:03 UTC
Permalink
Thanks for your help. I didn't see the problem like this. Therefore,
it means that if I want to compare K-means with different distance
function I should choose measures such as Euclidean, Manhattan and
Correlation for example.
Ashraful Alam
2013-09-16 15:39:06 UTC
Permalink
Post by Sandro Saitta
Thanks for your answer. I'm not sure to understand all your
explanations. The Manhattan, Euclidean (more precisely SqEuclidean)
and Chebyshev are three special cases of the Minkowski distance.
If p is the parameter of the Minkowski distance, then Euclidean
distance corresponds to p=2, Manhattan to p=1 and Chebyshev to
P=infinity.
I base myself on a paper which compares clustering with Euclidean,
Manhattan and Chenyshev distances for clustering using SOM (that why
I thought of comparing Chebyyshev with sqEuclidean). In a book by
d(x,y) = sum(abs(x_i - y_i))
Starting from this, I don't understand why it could not simply
replace the standard sqEuclidean distance.
Thanks in advance for your help.
Regards.
Ashraful Alam
2013-09-16 15:44:07 UTC
Permalink
hello sir,
i am making my project on image segmentation using fcm clustering method. i have read all the discussion described above. i am intereted to know that " Can chebyshev distance measure be applied on fcm clustering method, if so then what will be the matlab funtion for fcm clsterin method?

thank you sir
alex kullik
2008-05-12 13:54:03 UTC
Permalink
Post by Peter Perkins
Post by Sandro Saitta
Hello,
I want to try k-means with the chebyshev distance. I
know it is
Post by Peter Perkins
Post by Sandro Saitta
possible to use euclidean and city block distances but
what about
Post by Peter Perkins
Post by Sandro Saitta
chebyshev? Is there a way to use this distance with the
standard
Post by Peter Perkins
Post by Sandro Saitta
k-means matlab function?
1) a distance measure, and
2) a way to compute the centroid of a cluster so as to
minimize the sum of
Post by Peter Perkins
point-to-centroid distances.
There are lots of distance measures, but few have a simple
computation for the
Post by Peter Perkins
centroid. For example, the default distance used by
KMEANS (the function) is
Post by Peter Perkins
most definitely NOT "Euclidean", but rather "squared
Euclidean", because for the
Post by Peter Perkins
latter, the centroid is the arithmetic mean, while for the
former, it is a
Post by Peter Perkins
difficult computation requiring an iterative solution (the
calculation has a
Post by Peter Perkins
hyphenated name attached to it that escapes me).
That's why no Chebyshev distance in KMEANS.
That being said, there is a "coordinate-free" version of
K-means (the algorithm)
Post by Peter Perkins
that does not require a centroid calculation. However, it
is not the standard
Post by Peter Perkins
thing that most people want, and requires a distance
matrix rather than raw
Post by Peter Perkins
data, limiting its use for large datasets (which is what
most people use K-means
Post by Peter Perkins
for). Is that of interest to you?
I need this "coordinate-free" version of kmeans! Where can i
get it?
y***@gmail.com
2018-05-14 06:56:01 UTC
Permalink
Post by Peter Perkins
Post by Sandro Saitta
Hello,
I want to try k-means with the chebyshev distance. I know it is
possible to use euclidean and city block distances but what about
chebyshev? Is there a way to use this distance with the standard
k-means matlab function?
1) a distance measure, and
2) a way to compute the centroid of a cluster so as to minimize the sum of
point-to-centroid distances.
There are lots of distance measures, but few have a simple computation for the
centroid. For example, the default distance used by KMEANS (the function) is
most definitely NOT "Euclidean", but rather "squared Euclidean", because for the
latter, the centroid is the arithmetic mean, while for the former, it is a
difficult computation requiring an iterative solution (the calculation has a
hyphenated name attached to it that escapes me).
That's why no Chebyshev distance in KMEANS.
That being said, there is a "coordinate-free" version of K-means (the algorithm)
that does not require a centroid calculation. However, it is not the standard
thing that most people want, and requires a distance matrix rather than raw
data, limiting its use for large datasets (which is what most people use K-means
for). Is that of interest to you?
It's certainly also possible to modify KMEANS to use Chebyshev distance and some
unsuitable centroid calculation, but you're on your own there.
- Peter Perkins
The MathWorks, Inc.
Hi ,
I still did not intuitively understand, why the centroid calculation should be depend on the distance measure used.Why it can still be the usual centroid calculation. Can you explain with a example ?
Loading...